APPLIED 
LINEAR ALGEBRA 
with APL 


APPLIED _ 
LINEAR ALGEBRA 
with APL 


Garry Helzer 
UNIVERSITY OF MARYLAND 


Digitized by the Internet Archive 
in 2023 with funding from 
Kahle/Austin Foundation 


BS 
a3 LITTLE, BROWN AND COMPANY 
x 


Boston Toronto 


https://archive.org/details/appliedlinearalg0000helz 


Preface 


My purpose in this book is to provide material of immediate practical use 
while building upon and strengthening the mathematical skills the reader brings 
{o it. The book exhibits some continuity in both style and content with previous 
work while providing the practical details necessary for application. 

The book's readers will come from varying backgrounds — for linear algebra 
is important not only in the physical sciences and engineering but also in econom- 
biology, and, through multivariate statistics, in any field where large amounts 
of data are gathered and analyzed. These readers share the problem of processing 
their data in computers. Chapter 1 begins on this common ground, introducing 
Vectors and matrices as data structures suitable for storing and manipulating sets 
of data. This gets us directly into R" without any reference to higher-dimensional 
geometry. Connections are made with previous mathematics via evaluation of 
sums and approximation of integrals. 

Chapter 2 develops matrix algebra in the manipulative spirit of secondary 
school algebra. This concrete and familiar approach permits greater depth: one- 
sided inverses and least-squares solutions are discussed. Deeper proofs are post- 
poned until Chapter 3. 

Linear dependence is first mentioned in Chapter 2 in connection with the 
existence of left inverses. Thereafter the concept occurs from time to time 
throughout the text. There is no abstract chapter on linear dependence and di- 
mension theory. 

Restricting the discussion to R* and erhphasizing matrix algebra allows one to 
deal with a class of functions that is both wider and less abstract than the set of 
linear transformations. Affine and quadratic functions are introduced in Chapter 
2 and dealt with throughout the book. This makes the applications more natural 
and accessible. Newton’s method in higher dimensions and the linearization of 
nonlinear differential equation, for example, are quite natural in the affine con- 
text, whereas phrasing these procedures in purely linear terms raises barriers to 
understanding. 

By the end of Section 2.4 the reader should be able to quickly produce solu- 
tions to well-behaved systems of, say, thirty equations in thirty unknowns or to fit 
a cubic polynomial to a couple of hundred data points 

General (singular) systems of linear equations are taken up in Chapter 3. The 
properties of the row-reduced echelon form of a matrix developed here are the 


vil 


viii Preface 


basis of the development of flats (affine subspaces), coordinate systems, and 
subspaces given in Chapter 4. 

Chapter 4 begins with the analytic geometry of lines, planes, and conics in 
and R3. The discussion of conics, which is restricted to R®, is needed for the 
diagonalization of symmetric matrices and the second-derivative test for max-min 
in Chapter 5. The lines and planes are generalized to their higher-dimensional 
versions and connected with sets of solutions of systems of linear equations. Ma- 
Urix algebra is used to define coordinate systems on subspaces of R" and derive 
coordinate-change formulas for affine and quadratic functions. Only then are the 
more abstract notions of subspace and basis introduced. 

Distance and angle are introduced in Chapter 5 and orthonormal coordinate 
systems are defined, Then symmetric matrices are diagonalized via the Jacobi 
algorithm, and some eigenvalue theory is developed. This allows readers inter- 
ested mainly in statistical applications to avoid the general eigenvalue theory of 
Chapter 7. 

Perpendicular projection is emphasized and the least-squares computations 
of Chapter 2 are justified 

In some texts, linear programming is presented as a strange, out-of-the-blue 
trick (the simplex algorithm) that can be used to solve dog-food-mixing problems. 
Chapter 6 emphasizes the geometric aspects of the problem and offers a broader 
view of the possible applications and methods of solution. 

Chapter 7 emphasizes the geometric aspect of determinants, develops the 
standard facts about eigenvalues, and closes with some reasonably efficient func- 
tions for computing eigenvalues. A reader finishing Section 7.7 should have little 
{rouble analyzing the eigenvalues of matrices of small order (up to 20 by 20, say). 

The ability to get answers to nontrivial problems is what makes practical 
application possible. This ability is provided by the APL system. It would be 
wrong, however, to conclude that this is the sole function of the APL notation or 
that “programming” is being taught at the expense of “mathematics.” The “pro- 
gramming’ is here to reinforce and uid the mathematics. It is a teaching tool. It is 
ot something apart from the course's mathematical content 

This reinforcement is possible because, in spite of its name,t APL is not really 
4 programming language at all. It isa system of mathematical notation developed 
for the purpose of expressing algorithms symbolically. 

If you cannot express an algorithm precisely — that is, symbolically — then 
your understanding of it is in some measure incomplete. “You claim you under- 
stand the mathematics? Show me the output.” 

In this text the use of APL aids abstract understanding in several ways. The 
APL language is based on the idea of a function. In APL one does not write 
programs, one defines functions. It can be used to teach the idea that a function is 
4 rule —one that cannot necessarily be expressed as an algebraic formula. 

Also, APL is used in this book to teach mathematical induction. There are few 
loops in this text; instead, functions are written recursively. “Do you understand 
induction? Why doesn’t your function work?” 


TAPL stands for A Programming Language. 


Preface 


Proofs lose some of their abstract character when we see that they often 
describe algorithms that may be coded more or less directly into functions. The 
proof that Gaussian reduction can be used to produce a row echelon form, for 
example, is coded directly into an APL function that takes a matrix and computes 
its row echelon form. 

This approach has certain implications for both “programming style” and the 
treatment of numerical analysi 

The programming in this book is completely functional, All the APL func- 
tions take an explicit input and produce an explicit output, They do not alter their 
environment. In particular, messages are not printed and global variables are 
neither used nor changed. Any of the functions may be used in any expression 
where they make mathematical sense. This allows the construction, from chap- 
ter to chapter, of extremely powerful and flexible tools for solving linear algebra 
problems. 

Since the computation is designed to help the learning process and since a 
large portion of the theory is based upon the properties of the row echelon form of 
a matrix, our Gaussian reduction function computes the row-reduced echelon 
form of a matrix instead of, say, the LU factorization. This is in spite of the 
theoretical savings in computation time offered by such methods. For the same 
reason we are not particularly concerned about operation counts, We are inter~ 
ested in avoiding codings that cause the computational expense to grow exponen- 
tially, but this is about the extent of our concern, 

The style of computation envisioned here is personal computing} Operation 
counts are less vital when letting your microcomputer run all weekend costs a 
negligible fraction of your electricity bill. For the most part, FORTRAN-based 
cost estimates are meaningful for higher-level processors, such as APL, only when 
applied to large problems. The coding of the QR algorithm used in Chapter 7 runs 
much faster than would APL implementations of current state-of-the-art FOR- 
TRAN code for matrices of the size considered in this text. (Further, as of this 
writing at least one APL vendor has implemented an eigenvalue solver as a primi- 
tive APL function and extended domino to arbitrary matrices.) 

Accuracy is a different matter. Error estimates are beyond the scope of this 
text. As a consequence, we tend to answer the question, “How accurate is your 
answer?” with a second question, “What do you mean by an ‘answer'?” Given a 
purported answer X to a linear system Ax = b, we accept it as an “answer” if 
b — Axis sufficiently small. This spot checking is perfectly feasible in the interac- 
tive APL environment. The more subtle question of how close such an ¥ need be 
to the “real” answer must be left for more advanced work. The practice with 
actual computation, however, is excellent experience for more advanced work. 

Iam indebted to P. E. Hagerty for advice and criticism on the use of APL, to 
Ellen Correl, Hsin Chu, Jerome Dancis, and Avron Douglis who taught from 
preliminary versions of this book, and especially to the University of Maryland 
Mathematics Department technical typist staff, Berta Casanova, Cindy Black, 
June Slack, and Linda Fiori, who typed so many drafts of the manuscript. 


+Several popular microcomputers may be equipped with APL interpreters for a few hundred dollars 


DEPENDENCY DIAGRAM 


1413 


2.5*(MC) 


—o 


2.6%(MC) 


5.34(MC) 


5 pe] [> 
Ory 
51,52 
54,55 
fo, ae ee 
se] [61-63 17a 
oa 


‘ections within boxes fo be covered consecutively 


MC = multivariate calculus 


‘optional section 


Instructor’s guide 


This text may be used for a variety of linear algebra courses at a variety of 
levels. The sections marked by asterisks are optional. No later sections not marked 
by asterisks depend upon them. The sections labeled “multivariate calculus” as- 
sume some familiarity with partial differentiation. A dependency diagram is pro- 
vided. 


A sophomore-level course. This course proceeds straight down the dependency 
diagram from Section 1.1 to Section 7.7, omitting Chapter 6 and the optional 
sections. Class-time discussion of Sections 3.6 and 5.5 may be omitted and the 
results of these sections provided as library functions, If time permits, optional 
Sections 3.6* and 7,3* or 7.4* may be added, 


A junior/senior-level course emphasizing physical applications. This course con- 
sists of the sophomore-level course plus the optional multivariate calculus See 
tions 2.5*, 2,6*, and 5.3*. Since such courses typically have fewer class meetings 
per semester than a sophomore-level course, this material is often all that can be 
covered comfortably. If time permits, however, Sections 7.6%, 3.7*, and possibly 
3.8* may be added 


A junior/senior-level course emphasizing nonphysical applications, This course 
consists of the sophomore course with Chapter 6 replacing Chapter 7. This scheme 
avoids complex numbers but covers eigenvalues for symmetrie matrices—the 
main statistical application. If time permits, Section 7.1 (determinants) should be 
added. 


A second-semester course on applications or mathematical models, This course 
assumes a previous sophomore-level course in linear algebra and consists of 
Chapter 6 and the optional sections other than 7.4*. If the sophomore course does 
not use this text, then Sections 1.1, 3.1, and 3.4 are necessary for APL background. 
Additional APL operations such as matrix multiplication and inversion (domino) 
may be quickly explained as needed. If the students’ previous course emphasized 
linear transformations, then a rapid run through Sections 1.2, 1.3, 2.2, and 2.3 will 
improve their grasp of matrix algebra. 


xi 


DEPENDENCY DIAGRAM 


113 
21-24 


2.5%(MC) 


2.6%(MC) 


54,55 


FP. 


5.3%(MC) 


5.6* 61-63 cater 


‘Seotions within boxes to be covered consecutively 
MC = multivariate calculus 


* = optional section 


Instructor’s guide 


This text may be used for a variety of linear algebra courses at a variety of 
levels. The sections marked by asterisks are optional. No later sections not marked 
by asterisks depend upon them. The sections labeled "multivariate calculus” as- 
sume some familiarity with partial differentiation, A dependency diagram is pro- 
vided. 


A sophomore-level course. This course proceeds straight down the dependency 
diagram from Section 1.1 to Section 7.7, omitting Chapter 6 and the optional 
sections. Class-time discussion of Sections 3.6 and 5.5 may be omitted and the 
results of these sections provided as library functions, If time permits, optional 
Sections 3.6* and 7.3* or 7.4* may be added, 


A junior/senior-level course emphasizing physical applications. ‘This course con- 
sists of the sophomore-level course plus the optional multivariate calculus See 
tions 2.5*, 2,6*, and 5.3*. Since such courses typically have fewer class meetings 
per semester than a sophomore-level course, this material is often all that can be 
covered comfortably. If time permits, however, Sections 7.6%, 3.7%, and possibly 
3.8" may be added. 


A junior/senior-level course emphasizing nonphysical applications. This course 
consists of the sophomore course with Chapter 6 replacing Chapter 7. This scheme 
avoids complex numbers but covers eigenvalues for symmetric matrices—the 
main statistical application. If time permits, Section 7.1 (determinants) should be 
added. 


A second-semester course on applications or mathematical models. This course 
assumes a previous sophomore-level course in linear algebra and consists of 
Chapter 6 and the optional sections other than 7.4*. If the sophomore course does 
not use this text, then Sections 1.1, 3.1, and 3.4 are necessary for APL background, 
Additional APL operations such as matrix multiplication and inversion (domino) 
may be quickly explained as needed. If the students’ previous course emphasized 
linear transformations, then a rapid run through Sections 1.2, 1.3, 2.2, and 2.3 will 
improve their grasp of matrix algebra, 


xi 


Contents 


CHAPTER ONE 


Vectors, Matrices, and APL 1 


1.1 Some APL Notation 2 
1.2 Vectors 14 
1.3 Matrices 30 


CHAPTER TWO, 
Matrix Algebra 51 


2.1 Matrix Multiplication SI 
2.2. Inverse Matrices 63 

2.3 Matrix Algebra 80 

2.4 Affine Functions, Quadratic Forms 95 

2.5* Multivariate Calculus Derivatives, Maxima and Minima 106 
2.6* Multivariate Calculus Linearization, Newton's Method 113 


CHAPTER THREE 
Systems of Linear Equations 121 


APL Functional Notation 121 
olving General Linear Equations — 132 
The Echelon Form of a Matrix 154 
Branching and Recursion 166 

3.5 Automating Gaussian Reduction — 175 
3.6* Powers of Matrices 181 


xiii 


xiv Contents 


3.7* Nonlinear Equations 185 
3.8* Natural Cubic Splines (A Symmetric Tridiagonal System) 191 


CHAPTER FOUR 


Geometry and Coordinate Systems 200 


4.1 Geometric Vectors, Lines, and Planes 201 
4.2. Coordinate Systems in the Plane and Space 221 
4.3 Quadratic Functions in the Plane 238 

Flats and Coordinate Systems in R" 250 
45 Subspaces 267 


CHAPTER FIVE 
Orthogonality 280 


5.1 Distance and Angle 0 

5.2. The Diagonalization of Symmetric Matrices 301 

5.3* (Multivariate Calculus) Optimization—the Second-Derivative 
Test 318 

5.4 Perpendicular Projections and Least Squares 323 

5.5 The Householder Algorithm (Automatic 
Orthonormalization) 342 

5.6* Inertia and Principal-Component Analysis (Rayleigh’s 
Principle) 361 


CHAPTER SIX 


Linear Programming 375 


6.1 Examples of Linear Programming Problems 375 
62 The Geometry of Linear’ Programming 395 

63 The Simplex ‘Algorithm 4 
6.4* Sociobiology, Game Theory, and Evolution 431 


CHAPTER SEVEN 


Contents 


Eigenvalues and Eigenvectors 


A! 
72 
13% 
74 
ao 
7.6* 
WW 


APPENDIX A Answers to Selected Exert 
APPENDIX B_ A Short L 


Determinants 449 

Eigenvalues and Eigenvectors 470 

Powers of Matrices Revisited 484 

Congruences and Affine Transformations in Space 
Estimating Eigenvalues (Gerschgorin’s Theorem) 
Linear Differential Equations 511 

The QR Algorithm 519 


es 
st of APL Functions 


APPENDIX. C Some Miscellaneous APL. 


493 
502 


xv 


448 


542 
575 
579 
582 


CHAPTER ONE 


Vectors, Matrices, and APL 


Because modern research methods deal with collections of numbers, some knowl- 
edge of vector and matrix concepts is almost indispensable for research in most 
fields. The numbers dealt with may be, for example, the tabulated results of a 
questionnaire or of many repetitions of a laboratory measurement. Sometimes the 
numbers do not come from measurements but are generated in a calculation, as 
during computer solutions of differential equations, Matrix algebra provides a 
conyenient formalism for handling groups of numbers, and this formalism is com- 
monly used to process such data by digital computers. 

In fact, the most powerful scientific computers built since about 1980 are 
often referred to as vector machines. They are designed to perform a group of 
tasks simultaneously (parallel processing) instead of taking up each individual 
task of the group in order (sequential processing). 

In this chapter vectors and matrices are introduced as data structures — 
devices for organizing and manipulating collections of numbers. The chapter also 
introduces the APL notational system. 

Although APL is usually described as a programming language (in fact, the 
acronym APL is derived from the phrase “A Programming Language"), it origi- 
nated as a formal notation for describing algorithms.t It is a mathematical nota- 
tion that can be directly executed by a computer equipped with an APL processor. 
In effect an APL processor turns a computer into a (very powerful) calculator. As 
a result many difficult calculations may be carried out on such a machine without 
resort to programming. Programming —or, more properly, fiction definition 
—is taken up in a later chapter. 

As we shall see, the APL notation is particularly apt when parallel or vector 
processing is involved. 

Section 1.1 is devoted to elementary arithmetic in APL notation. In Section 
1.2 vectors (singly subscripted arrays) are introduced and the APL operations are 
extended to vectors. In Section 1.3 matrices (doubly subscripted arrays) are di 
fined and the APL operations are further extended to this case. 


+K. E. Iverson, A Programming Language (New York: Wiley, 1962), 


2 Vectors, Matrices, and APL. 


1.1 Some APL Notation 


This section introduces the APL system of mathematical notation. This system has 
been designed to execute directly on computers, avoiding the usual intermediate 
step of first translating the mathematics into a “program” and then running the 
program on a computer. 

Performing calculations in APL. is similar to using a hand calculator. In fact, 
an APL processor turns a computer into a very effective calculator. This machine 
orientation gives rise to a notation that differs in several respects from the conven- 
tional. For example, one writes x-2 for x? and axa for 34. Other differences will 
become apparent as the work proceeds; some of these are as difficult to assimilate 
as the change from miles to kilometers. The advantages, however, outweigh the 
disadvantages. 

In the text proper we will consider the use of the APL symbolism but not the 
practical details of connecting with a computer, controlling printout format, and 
so on, Some hints are given in Appendix C, however, and the various APL func 
tion symbols are listed in Appendix B. 

The basic operations of arithmetic — addition, subtraction, multiplication, 
and division —are denoted by +, -, x, -, respectively. In APL an expression is 
regarded as an instruction to perform the indicated operations. For example, if 
you type 2+3 (followed by a carriage return), the computer types 5 at the left 
margin and then moves the typehead in seven spaces on the next line to signal 
that it is ready for the next command. 


2+3 


Here are more examples 


3-2 you type 3-2, the computer types 1 
1 

23 you type 2xa, the computer types 6 
6 

1065 you type 10-5, the computer types 2 
2 


In the representation of negative numbers the APL notation for basic arith- 
metic differs from conventional notation, In APL. negative numbers are indicated 
not by the subtraction sign but by & separate symbol — the “high minus.” 

2-3 3 subtracted from 2 is negative 1 


24-3 2 plus negative 3 is negative | 


2-3 negative 3 subtracted from 2 is 5. 


1.1 Some APL Notation 3 


The subtraction symbol, -, is at the upper right of the APL keyboard (with +, 
), whereas the high minus - is at the upper left (shift-2). The reason for the 
high minus will become apparent when vectors are discussed in Section 1.2. 

Machine computations are rarely exact. In fact, most numbers cannot even be 
stored in the machine exactly. The number of digits a computer can handle varies 
with the manufacturer. Standard APL systems will display ten digits unless told 
otherwise. 


23 
0. 6666656667 


This is too many digits for most purposes, and so most examples are displayed 
to three digits, 


)oIGiTs 3 The )o/Gi7s is an example of an APL system com- 
WAS 10 mand, It sets the number of digits in the displayed answer, 
2.3 It does not affect the accuracy of the calculation in any 
0.667 way.t System commands such as )01G/Ts are dealt with 


in Appendix C. 

‘The APL language has many arithmetic functions in addition to +, -, «, and ., 

A complete list of APL functions is in Appendix B. Here are a few that we shall 
use in subsequent examples. 


22 ab is APL notation for a. Notice that Va is a”* 


as + is called exponentiation 
1414219862 


13 1 is the factorial function. Notice that in APL one writes 
6 “ya” rather than “s)."" (The exclamation point does not 
16 appear on the APL keyboard. It is typed as quote-back- 
720 space-period.) 
aa \ is called Floor. (a is the greatest integer that is less than 
2 1 to a (sometimes called “the greatest integer func- 
tsa 
4 
a la is the absolute value of a. 
3 
Is 


{The examples in this text were all computed on a UNIVAC 1100/42 using the APL-1100 processor 
developed at the University of Maryland by P- E. Hagerty. This processor does most calculations in 
ouble-precision floating-point arithmetic, which for UNIVAC machines is about eighteen decimal 
digits. Other machines and processors may produce slightly diferent answers, 


4 Vectors, Matrices, and APL 


VARIABLES 


The phrase “Let A denote the number 3” is written in APL symbols as 4-3. One 
may then use the letter A anywhere that the number 3 would be appropriate. 


Ans 
4A 
4 
AA 
27 
4 (Display A) 
3 


Ais called a variable. To display the value of a variable, type its name. This 
has been done in the last two lines above. Variable names are not restricted to 
single letters. They may be almost any combination of letters, digits, and the 
symbol 3. 


SAMA 
AxSAM 
Here are more examples. 


Bulsa 


4-16 Look at this example carefully. It reads from right to left. 
A “Take A, add 1 to it, and call the result A” or “Increment A by 
9 it 


MONADIC AND DYADIC FUNCTIONS, 


In the APL notation most symbols denote two different (usually related) func- 
tions. Here is an example. 


16 tnisn(n = 1Xn — 2) 2+ 1, the factorial function.+ kin is 
720 the binomial coefficient (j}) (the number of subsets of size k 
10 that may be chosen from a set of size n). 
fin fact, 1mis MQ + 1) where I denotes the gamma function. This means k need not be an integer. 


The function km is closely related to P(n + 1Y/(k + HYP — & + 1), 


11 Some APL Notation 5 


‘A monadic function has a single argument, which is written to the right of the 
function. If fis a monadic function, we write fa in APL for the more conventional 
fia). One can write f(a) if one wishes. The parentheses are not wrong; they are 
simply unnece 


108) 
720 


A dyadic function has two arguments. One is written on the left and the other 
on the right. If fis « dyadic function, we write afb in APL for the conventional 
‘f(a, b). Unlike the monadic case, the conventional notation (a, b) cannot be used 
instead of afb. The symbols f(a, b) mean something quite different in APL. 

Dyadic functions seem a bit strange until we realize that the basic arithmetic 
functions +, -, x, -, + are all dyadic. We conventionally write a+b, axb, a-b rather 
than the “/(a, b)" forms «(a, 6), x(a, b), -(a, 6). One could keep the dyadic form for 
the functions +, x, -, -. say, and use the f(a, b) for all other functions of wo 
variables. But the existence of special cases would complicate the automatic ma- 
chine execution of formula 

The need to treat all functions in the same way is also the reason one writes 16 
rather than 61, |~3 rather than |~3), and so on. 

Here are some more monadic-dyadic forms. 


263 
8 

25 
1414213862 arb is at 

24 
os 

” «a ise, where e is the base of natural logarithms. Since 
2.718261828 e! =e, the number ¢ may be obtained as +1, 

5 


4648721275 
03678794412 


13.56 


i467 ais the floor of a, defined previously 


6 Vectors, Matrices, and APL 


ata a:b is the minimum of the two numbers a and b. Since 
a 2+-1 is 4, the second computation gives the minimum of 
2L(2e-1) 2and } 


The symbols +, -.«, - denote monadic functions as well as the usual arithmetic 
functions. Two important ones are monadic . and monadic - 


2 -a is \/a 
os 
as is the negative of a. Notice that the high minus ~ does 
= not denote a function. It is a part of a number — like a 
3 digit or decimal point. 
A 
LEXICAL ERR 


EXAMpLe 1.1 Write the expression VI +2 in APL notation. 
Solution (1+2)+ 8 OF (1+2)*(-2) . 
EXAMPLE 1.2 Write the expression | —3)" in APL notation. 


Solution (| 3)+3 0r (i-3)03 


In the first expression the high minus is used to indicate a negative 3. In the 
second expression the monadic ~ is used to change 9 to -3. 


EXAMPLE 1.3. Write the expression | + e¥* in APL notation. 


Solution 1++(5+ 5) oF 14*(6*(-2)) ow 
Exampte 1.4 Write the expression sin*4 in APL, where 4 means 4 radians. 


Solution From Appendix B we sce that tox is the sine of x radians. Thus the 
answer is 


(104g 
ExampLe 1.5 Write the number 7 in APL notation. 


Solution From Appendix A we see that oa is = times a. Thus the solution is 


ote 


11 Some APL Notation 7 


EXAMPLE L6 Translate the statement 
“Let be 7" or “let X= 7" 
into APL notation. 
Solution These are standard mathematical phrases that correspond to the APL 
expression 
nT 8 


Exampte 1.7 Translate into APL: 


Set A equal to 7 
and set B equal to e# 


Solution AT 
Bes(-A) 
Notice that one must use the monadie -. The high minus — cannot be used. 


The solution can be written on one line 
Bur(-(Ae7)) 
Exampce 1.8 Translate into APL: 


Set D equal to VB? — 44C 


Solution 0-((8+2)-4xaxc)* 5 
Note: Typing this expression on the computer will result in an error mes: 
unless 4, B, and C have already been assigned values. 


ge 


ORDER OF EVALUATION 


The parentheses about 8-2 in the last example are necessary, although they are 
not necessary in conventional notation, Furthermore, many of the parentheses in 
the examples above are not necessary but are simply there for clarity. For exam- 
ple, the solution to Example 1.3 could be written 


yesse2 


This is because the assumed order of evaluation in APL expressions is different 
from the conventional order. 


192%3 


8 Vectors, Matrices, and APL 


2xart 
8 

(243941 
7 

2e(3*1) 
8 


Evaluation Rule 


Start at the right-hand end of the line and work to the left, evaluating each 
function as you encounter i 
In the example above 


1eexa 
|__ First compute 23 obtaining 6 


7 
|_____ then compute 1+6 obtaining 7 


pear 
First compute ar1 obtaining « 


8 
|___ then compute 2x4 obtaining 


Notice that you can force any order of evaluation you prefer by using parentheses. 
Expressions in parentheses are evaluated first. When in doubt use parentheses! 


EXAMPLE 1.9 Evaluate the APL expression 
t-2-a 
and translate it into conventional notation. 


Solution Working from the right, we first evaluate 2-3 and then we subtract the 
result from 1. Thus we have 


1-(2-3) 
or 2. The conventional expression is 1—(2—3)or 1-243. 


EXAMPLE 1,10 Translate the APL expression 


into conventional notation. 


Solution Starting at the right, we first encounter +s, Notice that this is the mo- 
nadic - because there is a function symbol rather than a number or variable to the 
left of this -. Thus the expression is equivalent to 


11 Some APL Notation 9 


2°(+5) 
or 


oe 
Exampte 1.11 Translate the APL expression 
2nd+4K5 
into conventional notation, Evaluate, 
Solution Working from the right, we have 


2x( 34 (4%5)) 
or 


46 . 
Exampre 1.12 Evaluate the APL expression 
arriarie 
Solution Working from the right, we obtain 
(ar(mie2raayy) 


To evaluate the expression, we note from Appendix B that dyadic + is the maxi- 
mum function, dyadic \ is the minimum function, and monadie \ is floor, Thus 


(3 (7LC-27129)) 


is Welnersy) 
ar 

is u 

or 7 . 


Exam 1.13. Translate the APL expression 
AB 
into conventional notation. 


Solution Starting from the right, we first encounter A — B; thus the expression is 
—(A—B)orB-A. 


The rules for evaluating complex APL expressions may be formally stated as 
follows: 


10 Vectors, Matrices, and APL 


RULE Ifa function symbol has number, variable, or expression in parentheses 
to its lef, then it represents a dyadic function. Otherwise it represents a monadic 


function. 
RULE 2 A monadic function operates on everything to its right. 


Rute 3 A dyadic function operates on everything to its right and the number, 
variable, or expression in parentheses to its left. 


RELATIONAL AND LOGICAL FUNCTIONS 
(ey te Pe Se te MAD 


‘A mathematical notation that is both powerful and machine readable must allow 
us to write statements normally expressed in English sentences as “formulas” — 
that is, as formal expressions, 

For example, the statement 


Let A be the set of prime numbers between | and 20 


can be written symbolically in APL# and processed by a computer equipped with 
an APL processor. 

‘The writing of such expressions is made possible in part by the APL meanings 
of the symbols », #, <, >, <, 2. In APL these symbols denote dyadic functions, 
called the relational functions. Here is how they work: 


The expression <2 is considered to be a question that requires a yes or no 
‘answer: 0 means no and 1 means yes. 


Again 
1-3 Is 1 equal to 3? 
° No 
143 Is I not equal to 3? 
1 Yes 


The relational functions may be used in expressions in the same manner as any 
dyadic function. 


tAn(2=+/ 0=(120)=. | 620) 7120. This sa formal description ofthe sieve of Eratosthenes: see 
exercise 46 in Exercises 13. 


LI Some APL Notation 1 


Exampte 1.14 Evaluate the APL expression 


4a3e1 


Solution Asin any APL expression, one may begin at the right and work to the 
left, evaluating each function as it is encountered. Doing this (or applying Rule 3 
directly to the dyadic function =), we see that the expression is equivalent to 


4-(a91) 


or 4-4. The answer iss. 


EXxamPLe 1.15 Evaluate the APL expression 


Be104 


Solution Again either by working from the right or by applying Rule 3, we see 
that the expression is equivalent to 


34164) 


Thisis3-oors, ao 


EXAMPLE 1.16 Evaluate the APL expression 


Solution The expression is equivalent to 4=(e(+4)). From Appendix B we 
see that +4 is et and monadic « is the natural logarithm. Thus «+4 is In(e*) or 
4. The answer is 1. 


The logical functions A, v, ~,4, > are called and, or, not, Nand, and Nor. They 
operate exclusively with the “true/false” values + and 0. The operation tables for 
AND and OR are 


The function NOT is monadic and changes 1 to 0 and 0 to 1. The table for 
acan be read “false AND false is false,” “false AND true is false,” “true AND false 
is false,” and “true AND true is true.” Similarly for OR. The operation of ~can be 
read “NOT true is false” and “NOT false is true.” 


12 Vectors, Matrices, and APL 


EXAMPLE 1.17 For what values of X does the APL expression 
(82 )AU 1) 

result in a 1? a0? 

Solution For the expression to result in a 1, both x<2 and x=-1 must result 

ina +. Thus x must be less than or equal to 2 and x must be greater than or equal 


to —1, The result of the expression is a 1 if x lies in the interval [—1, 2} Otherwise 
the result is ao, # 


Notice that in such expressions 4 really means “and.” The next example 
shows why the funetion 4 is necessary in the expression above, 


EXamete 1.18 What is the result of the APL expressions? 


xe 
2oXse 


Solution Using the rules for evaluating API. expressions, we see that the second 
expression is equivalent to 


2a) 
since x is 3, x<4 is 1 and the expression reduces to 2<1 or o. Thus, although 
x lies in the interval [2, 4}, the expression 2x4 results in a 0 or “false.” To 
test if x lies in the interval [2, 4}. one must use such expressions as 


(2K OF (ASK)AREK ow 


ExaMte 1.19 For what values of the variables 4 and B does the APL expression 


(~Ava)= 


“ANAC 


result in a 1? 


Solution In order for the expression to make sense, the variables A and B must 
have values o or 1. Thus we are immediately reduced to four possible cases A~0, 
B05 An0, B15 Ant, 8-0; Ant, Bet, 

Suppose 4.0 and 6-0, Then the expression is 


(-0V0)=(~0)¥(~0) 
or (Oye 
or ; 


or : 


LI Some APL Notation 13 


Checking the other three cases in this way shows that the expression always re- 
turns a 1. This means that the expression is an identity. In formal logic this 
identity is known as one of DeMorgan’s Laws, 


EXERCISES 1.1 


Write the following arithmetic expressions in APL notation, All angles are in radians. 
A list of APL functions may be found in Appendix B. 


L vi 2 341 3. 341 4 

Sa TN 6 Ind +2 i Sa 8. 

9% (ey 10. V34+7 We V347 12. v2 

13, 7/180 14. tan7 15. sin (32/2) 16. cot7 

17 sec?3 +1 18. |-3) 19, sinh? 4 + cosh® 5 20. Tan™! (cot (3x) 
21, logyo(In3) 22. 2. ef — at 


Write the following algebraic expressions in APL notation, All angles are in radians. A list 
‘of APL functions may be found in Appendix B. 


mu, <t 25, e 26. sin3x 2. 
2% 29. bh 30. or 31 
va +h— Va 1 
yy Ieee Sere 34, Inju 35 
h vi¢x? mi 


Evaluate the APL expressions. Answers should be worked by hand and checked on a 
computer, if one is available. A list of APL functions may be found in Appendix B. 


36. 1r2K3 37. 142-3 38. 1-2-3 39. 2x3-4x7 
40. -3+2 Al 302 42, 35-2 43, 30-2 
4A et 45. +06 46. 10-10-2 47, 10° 1001 
M8. (-1-3yse2 49. (12120)-(120)~( 1 12)x1 20-12 

50. (2x607)-(+7)4*-7 SI, (543 52. 5-3 

53. (5-3)-15-3 $4. 5-(3x15-3) $5. 10004 

56. ..5+100-3 57. werararsre 58, 1P2isiarsrs 
59. 112131 4Lsi6 60. (A+ |A)-2m0CA where a is any number. 


‘Translate the APL expressions into conventional notation. 


61. A-8-c 62. 100x 63. 2-B-C+0 64, AxB-C 
65. xeye2 66, x+2+¥2 67. 30°30x 68. 19x 
69, 1+4ax 70. on+2 71. Which is larger, e* or ="? 


What is the result of the following APL expression(s)? Answers should be worked by hand 
and checked on a computer, if one is available. A list of APL functions may be found in 
Appendix B. 


14 Vectors, Matrices, and APL 


rR 73. 4-9-1 74. 4-341 7S. 144-3 
16. x3 Ti. x3 B. xs 79. Kae 
1k (1s) KS12 (4ex)vKeI2 19021K 


80. o-1=2-3-4=5 


1.2 Vectors 


In applying mathematics to real-world situations, it is often necessary to process 
large quantities of data. The vector concept, although originating in the physics of 
three-dimensional space, has become indispensable for the symbolic manipula- 
tion of large quantities of data. The related concept of a matrix will be discussed 
in the next section, 

We begin with an important example. 


THE LEAST-SQUARES STRAIGHT LINE 


Suppose that fifteen adult males are chosen at random and their height (in inches) 
and their weight (in pounds) recorded. The resulting data are given in Table 1.1 
and plotted in Figure 1.1. 


TABLE 1.1 


Height (x) | 60 61 62 63 64 65 66 6&8 69 70 71 72 74 75 76 


Weight (y) | 120. 120 135 135 130 135 150 140 170 145 160 160 160 160 175 


190 all 


180 


170 ea 


160) 


150 


140 


130 


120 


60 62 64 66 68 70 72 74 76 
FIGURE 1.1 


12° Vectors 15 


Assuming that weight, denoted by x, 


a straight-line function of height, 
denoted by 


what is the best estimate of this function?+ 
we are assuming a relationship of the form 


= ay + 44x (Ql) 


and we wish to estimate ay and a, from our data. 

The estimate of the line (1.1) most often used in this situation is the least- 
squares straight line. This line is characterized by the fact that a and a, are chosen 
to minimize the sum 


(1.2) 


where the measured data points are (x3..¥4)s(¥a) abe + + «Oy Jy)s SNE dy + 4%, 
is the point on the line (1.1) with x coordinate equal to x,, one sees that (1.2) is the 
sum Sd? where d, is the vertical distance from the point (x,,y,) to the line (1.1). 

The coefficients a, a, that minimize the expression (1.2) can be shown to be 
the solutions of the normal equations: 


nay + (= x,)ay => 


(> x,) ao + (Ss)a = Sy a 


where all sums are taken as / ranges from I ton. These equations will be proved 
several times later in increasing generality (see Exercise 61 and Section 5.4), 
The solution of (1.3) is given by 


(4) 


1 to he proportional to Volume and volume to be proportional to (height)? In 


+One would expect wei 
= pus population, however, a straight 


the limited range of heights of adult males in a eulturally homo, 
line might be good enough. 


16 Vectors, Matrices, and APL 


TABLE 1,2 
x Yy xy P het 
6 | 120 7.200 3,600 
61 120 7,320 3,721 
62 135 8,370 3,844 
6 135 81505 3,969 
64 130 8,320 4,096 
65 Bs 8775 4.225 
66 150 9,900 4356 
68 140 9,520 4.624 
69 170 11,730, 4.761 
70 14s 10,150 4900 
a 160 11,360, 5,041 
n 160 11,520 5,184 
4 160 11,840, 5,476 
5 160 12,000 5,625 
76 17s 13,300 5,776 
TN = 1016 = Y= 2195 SAXY = 149,810 = N® = 69,198 


44 = {(2195)(69,198) — (1016)(149,810)]/1(15)(6918) — (1016)*] = —55.5 
4, = [(15)(149,810) — (1016\(2195)]/1(15)(6918) — (1016)?] = 2.98 


To facilitate hand computation, the data may be arranged as in Table 1.2. 

‘The first two columns X, ¥ contain the original data. The columns XY and X? 
fare calculated line by line from X and ¥. The results are then summed and a. a 
calculated. The line y = a + a,x is plotted in Figure 1.1 

We can denote the first column of the table by the single variable X and the 
second column by the single variable ¥. Then X and Y¥ denote vecrors. 


Derinirion 1.1 A vector is an ordered list of numbers.t The individual numbers 
in the list are the components of the vector. 

When we say that the list is ordered, we mean that there is a first component, 
a second component, and so on. In the example above the first component of is 
60, the second component is 61, and the tenth is 70. In Y the first and second 
components are both 120, the seventh is 150, and the eighth is 140. 


In APL notation the vectors ¥ and ¥ of the table are defined by the expres- 
sions 


Xn60 61 62 63 64 65 66 68 69 70 71 72 74 75 76 
Ye120 120 135 135 190 135 150 140 170 145 160 160 160 160 175 


We display the value of a vector by listing the components separated by 


+ This is probably the most widely accepted definition of the term “vector.” It comes from computer 
stience, where a vector generally means any singly indexed list of data — numeric or not. The original 
geometric meaning of the term can be found in Chapter 4 


12° Vectors 17 
blanks. Other examples of vectors in APL notation are 


1273 three components 
1-247 four components; the high minus required on the second com- 
ponent is explained below. 


The number of components in a vector x is the called the size of x or the 
shape of x and is denoted by »x. Using the above X and Y, 


ox 


VECTOR ARITHMETIC 


In APL notation the usual arithmetic operations have been extended to vectors 
If A and B are vectors of the same size — that is, if (yA)<p8 is true — then 
+a is the result of adding corresponding components of 4 and @. 


1.364 0-72 


9 67 8 

123a4+5678 
6 8 10 12 

123412 
LENGTH ERROR 

12a 2 


The same is true of the other operations ~, ». 


123-456 
aos 
1290456 
41018 
123-456 
025 04 085 
123-456 
1 32 729 


Now let us return to the least-squares example above. The first two columns 


18 Vectors, Matrices, and APL 


of Table 1.2 have been stored in the variables Vand ¥, The next two columns 
were computed as 


wyoxny 
7200 7226 8370 8505 8320 4775 9900 9520 11730 10150 17360 11820 11840 12000 13300 


3600 3721 9844 3969 4096 4205 4956 4624 4761 4500 Soar S184 5476 5625 5776 


These vector multiplications are examples of parallel processing. In this case 
fifteen multiplications are to be carried out and may be done in any order or 
simultaneously, Operations such as xx¥ may execute quite quickly on vector- 
oriented machines even when X and Y are quite large. On any type of computer, 
however, APL expressions will operate most efficiently when parallelism is ex- 
ploited, More important, the use of vector operations tends to simplify formulas, 
making them easier 10 understand. Notice that here and below we have elimi- 
nated the subscripts from Equations (1.4). 

The next task in computing the least-squares straight line is summing the 
components of the vectors x, ¥, xv, and x2. In APL notation SX is written 
+oxcand read “plus over x.” If x is any vector, then +/x is the sum of the 
components of x. Thus the sums of the columns of Table 1.2 are given by the 
following calculations (the monadic + —see Appendix B—has been used to 
define and display a variable on a single line) 


+SKet ee 
1016 

+syerey 
2195 

1SxYe0 XY 
149810 

SK 204 1x2 
69198 


Now we can compute ay. a) from Equations (1.4). 


$30(1545x2)-—SKe 
S714 

+ADe-( ( SYKSX2)-SX*SXY)—3, 
55.53902695 

FALe( (154 SKY) -SXAGY) 3 
2 98039902 


We have given the least-squares calculation as an example of how vectors are 
used to handle calculations involving large quantities of data. The procedure 
above is not the most efficient way of making a least-squares calculation. As more 


12 Vectors 19 


linear algebra and APL is introduced, the least-squares calculation will be made 
more general and more efficient, 

Incidentally, the extension of the usual arithmetic operations to vectors is 
what makes the “high minus” necessary. Consider the two calculations below 


In the first calculation we have a vector whose third component is negative, In 
the second calculation the vector 3 4 is subtracted from the vector 1 2, 

The arithmetic functions +, -. », -, = are not the only dyadic functions that 
‘operate componentwise on vectors. Any of the scalar dyadic or monadic functions 
operate componentwise on vectors, Here are some examples, The de 
the functions may be found in Appendix B. 


ri2zaas 

12 6 24 120 
Ay 2-34 
+A 

12-34 


4 0.§ 0.9933933233 0.25 
XA 


7 289056099 0.04978706037 54 59815003 


23456 


23456 


23 40456 
2 1464973521 1 29248125 
2 3065 6 
0. 7568024953 0 2836621855 02910061914 
248123 
BB 2 


20 Vectors, Matrices, and APL 


EXAMPLE 1.20 What is the end result of the APL expressions: 


x41 327 
Yet 271 
ey 

Solution 1927--12710r21-56 @ 


EXAMPLE 1.21 What is the result of the APL expressions: 


Ano “2:7 -12 14 
AGIA 


Solution From Appendix A we see that monadic | is absolute value, Thus 4+/4 
is2 -2.7 -12 142.2712 140r40 14028. 


REDUCTION 
The expression +/x used above to sum the components of a veetor is an example 
of the use of the APL reduction operator." The function “+” may be replaced by 
any of the dyadic functions from Appendix B. For example, to multiply the com- 
ponents of a vector together use “x” instead of "s”: 


KeS 4321 
wk 
120 


If isa dyadic function from Apppendix B, then the expression « x produces 
the same result as putting an » between each pair of components of X. Thus, with 
the above X, «/x produces the same result as se4e3-2+1 and «x produces the 
same result as 64321. Here are three other useful reductions: 


1. The maximum component of X. The expression (x gives the maximum 
component of X. For example, with the above X, © x is sfararars or 
SCAT CSE (2T1))). 


ix 
5 


2. The minimum component of X. Similarly, \ -x picks out the smallest compo- 
nent of the vector X 


Lyx 
' 


12° Vectors 21 


3. The alternating sum. The expression ~/x computes the alternating sum of 
the components of X. For the above X, -/x is s-4-3-2-1. 
In conventional notation this is 


$-@-(3-(2-1)) or 5-443-24+1 


VECTORS AND SCALARS 


A vector x is a list of numbers and px gives the size of the list. An assignment of 
the form 4-3, however, defines 4 to be a scalar, A scalar is simply a number that 
is not considered to be part of any list. The expression »A-a or »3 will cause a 
blank line to be printed, We will return to this point short 

We have seen that 4x6, for example, is defined if 4 and are both scalars 
or if A and 8 are vectors of the same size. The expression 4x8 is also defined if 
only one of the variables 4 or is a scalar. 


axa 45 
6810 

$4 5x6 
18 24 30 


The scalar is first extended to a vector of the proper length and then the 
multiplication is carried out. Thus the first calculation is the same as 
2 2 2x3 4 5 and the second is the same as. 9 4 5x6 6 6. Similar remarks apply 
to the other scalar dyadic functions. 


aaas 
567 
34566 
729 4096 15624 
The utility of this special action for scalars is illustrated by the next example. 
POLYNOMIALS 
‘The conventional expression 3x? — 2x +7 may be written in APL as 


4/3 9-2 7X2 10, Xa scalar 


To see this we work from the right, evaluating each function as we encounter it 
This procedure shows that the APL expression is equivalent to 


$/ (3 -2 Tx(%42 109) 


22 Wectors, Matrices, and APL 


If-x is a scalar, then x+2 1 0 creates the vector x* x! x°. The multiplication 
then results in the vector 3x2. —2x 7 and the +» sums the components of thi 


vector. 


ExaMpie 1.22 Compute the value of the polynomial x* — 2x + I when x = 12. 


Solution 


s/y ob yr2e2 10 
124 . 


EXAMPLE 1.23. Compute the value of the polynomial 3x — 6x +2 for x = 6. 


-2, Vie, 


Solution 


Pea 6 2 Since several evaluations are requested, we store the coefti- 
& 5 1 0 _ cient vector and exponent vector. 


W/PXOrE 
FEETy) 

+) PROBE 
ar 

Keres 

+ipxxse Alternatively, +/Px(7*5)+€. But the latter expression is 
275 0509349 more prone to typing error 

Pay 

spake — Alternately, + Px(+1) x6 
430. 9297863 . 


LINEAR COMBINATIONS 
Two of the vector operations defined above are fundamental in the study of linear 
algebra. These operators are 


1, The addition of two vectors. 


scalar. 


2, The multiplication of a'vector and a 


Let v be a vector and a a scalar. In the conventional mathematical notation 
oxy is written av. Further, vectors are usually set off in parentheses with their 
components separated by commas. Thus the vector » ~s 7 is conventionally writ- 
ten (1, ~3, 7), There is some conflict between APL notation and conventional 


12° Vectors 23 


notation here, but both systems should be mastered. To avoid confusion, we shall 
always display APL expressions in a special typeface. 

In conventional notation the definitions of vector addition and yector-scalar 
multiplication become 


1. (ay, ag, @5,--- 04) + (By. 


2 alBy. Ba By 


By «+ ~+ By) = ay + By ay + Be 
Ba) = (Bay af. 084). 


Combining the two operations in a single expression gives a linear combina- 
tion. 


Derinirion 1.2 Let 0,0, 
be scalars. The vector 


os Be Vectors Of the same Size, Let ay, ay.-005 


b= Ady + Oty + 2 Hyd, 


is a linear combination of the vectors v4. t. 


EXAMPLE 1.24 Let vy = (1.3.7), vy =(—1,2,123), vy = (0,2, —4), 0 
a = .3, ay = 7. Compute the linear combination 


© = ay; + 440, + avy 


Solution 


AVe(Gx1 3 7)e( 9x1 212 3)47HO 2-4 
$.7 32.6 17.69 


Notice that the APL order of evaluation makes the parentheses necessary. = 


CATENATION 
So far we have not explained how to express a vector such as (2, 3,4) in APL. If 
we try the expression 
2454 
we obtain 
239054) 
which is (\/2, 34) in conventional notation. 
This problem is overcome by introducing the catenation function. This func- 


tion is denoted in APL by the comma (, ). Catenation sticks vectors and scalars 
together to make larger vectors. 


24 Vectors, Matrices, and APL A 


Vectors 28 
2.3 If x is the vector of indices (1, 2, 3,..., 400), then the APL notation for this 
23 sum is 
2,345 
2345 teKK 
23,4567 
234567 This observation reduces the problem of computing the sum to the problem of 
2,(3+-5).4 


generating the vector of indices K. The function that accomplishes this in APL is 
the index generator, whose symbol is the monadic «, If n is a nonnegative integer, 
then «nis the vector (1, 2,3,....m)- 


2 1 732050808 4 


Catenation is a dyadic function and is treated like any other dyadic function. 
Thus, working in from the right on the last expression, we see that it is equivalent 


s 
% y2345 
2 
2,¢(9".5).4) are 
23 


eas 23+ re 0 
Wine easieis erase cIuierec 123456789 1011 12 13:14 15 16 17 18 19 20 21 22 23 


2.34 5.4) 400 

Now we can compute 5S) (k)¥* 
which, in conventional notation, is (2, V3, 39). res 
Exampte 1.25. Write the vector (V2, e, —7, 14) in APL notation, Pane 


6 045595546652 


lution (2%~2),(*3),°7 “14 OF (2* 8). (93).97 148 ow 


The € indicates scientific notation. For example, 6 2€°2 means 6.2 x 10 or 


EXAMPLE 1.26 Write the APL expression 062. 
100 
aaoaimen Exampte 1.27 Compute S) (k* + 4k). 
a 
in conventional notation, 
Solution 
Solution Working in from the right, we have 
e110 
2.¢4(8,-@2 1))) +) (AxR) OK? 
2.003 -2 -1) 358550 
+/(440100)+(4 100) +2 
In conventional notation this is the vector (2,e%e%.e-}). ow 358550 . 
20 
: Exampte 1.28 Compute S k* + 


INDEX GENERATOR 


Consider the problem of evaluating the sum 


Solution 
400 
Sw Ke-144633 
+ ORKKT 


3652 . 


26 Vectors, Matrices, and APL 


‘The result of the index generator is always a vector. Thus «1 is not the scalar 
+ bul a vector of size I. 


a“ 
pat 
Further, «0 is defined. It is a vector without components, and it prints as a 
blank line, 
0 


+— blank line 
p10 


The vector 0 is really quite useful because it is an identity for catenation, 
0 is an identity for addition and 1 is an identity for multiplication 
We are now in a position to describe what » does with a scalar — it returns «0 


03 
— blank line 


The result of monadie » is always a vector. 


INDEXING 


Often we must deal with individual components of vectors. If v is a vector, then 
the third component of v is written v(3} 


Vel 2-7 4120 


vay 
7 
vay 
12 
vis 4) 
712 
via 3) 
42-7 P 


Notice that we are not restricted to single indices. A vector of indices will 
produce a vector result. Indexing may also be used to change individual compo- 
nents of a vector. 


HThat is, v.00 is V just as v+0 and Vit are both v, 


1.2. Vectors 27 


v 
12-7129 

Vi21~6 

v 
16-7129 

Vid 5]-2 3 


We -7 42 <3 


EXAMPLE 1.29 What is the result of the APL expressions: 


wA3 57 
Vir 3}-VI9 1) 
v 


Solution v3 11 is the vector s 1; thus the second line is vjy 3)=5 1. Thus 
vbecomess a7. # 


EXERCIS! 


What is the result of the following APL expressions? Calculate by hand 


12 


ind check your 


answers al a terminal if one is available, 

I: 42=3%4 2 42034 312-9 4-56 

4.1 2x3 4.12 Sat 23-45 -6 6 Art “23-46 -6 
(AvlAaya2 (A-[A)o2 

7 (19 219-3 2 aR & (5678-1234 

9% (5678-1 -23 4 WW 21234567 


12 aoos4 
(1 2 3701 2 30°49.832 


M1 23455 6 7/10 
13, 2 4 6 02 4 6 8+-49.832 


Write APL expressions to evaluate the polynomials at the points indicated. Compute the 
results at a terminal if available 

15. 3x? —2e + Lae 
16. Sx — x +2 atx =3, V2 sin(x/13) 

17, x! = x — x8 43x — Date = 98, 1, 1.01 

18, x 20! 4 x! — xh 4 ax? — Sy + 2 at x = 98, 1, 1.01 


For the following problems, write APL expressions to compute the linear combinations 
yb, + ayes + 00) + ©-- + 44,04. Evaluate the expressions at a terminal if available. 
19. 0, = (1.0.1), vp = (0.1.0), a, = 6, a =3 

= (1.0, 1), ¥; = (0, 1,0), a, = =12, 0 
2 vy =U1,§, 1), ve = (1.2,3), 0, = 6,02 =4 
(1,2,3,4,5). 05 


16 


2,-7.3, 6,9) 4, =2, ay = —I, 


28 Vectors, Matrices, and APL 


Write APL expressions to evaluate the following sums, Evaluate the expressions at a 
terminal if available. 


0 
24. Ske +k) 23. Sie +0 


=1ysgee %  2nk Ink 
(sibeee 28, Si sin cos 
ORs 1 S ** 00 “°° 00 


29. Recall that 


Thus for each n, Sf (x*/A!) is an approximation to e*, Taking x = —10, what value of 
nn provides the closest approximation to +-10 on your machine? 
Let f be a continuous function on the interval [a, 6}, We can approximate 


frou 


by the sum 


S fly) Axi where 


This approximates the area under the curve by a sum of areas of rectangles —as in the 
definition of Riemann integral. 
In the following problems compare the approximation Ax 


fls,) with the exact 
value of f /(1) di for the given f, a, and b, Take n = 100, 


30. fo =xa=0,b=1 31. fy) =x a =0,b=1 
32. fl . 6 = 2/2 33. fix) =sinx.a =0,b =o 
34. f(x) = (In /x, a = 1b 35. fix) = cos 3x sin Sx, a = 0, b = 2 


COS x, a 


What is the value of the vector #'? Answers should be worked by hand and checked at the 
terminal if one is available. 


36. void 37. ve 9 38 v7 
Vi2\-2+ 5 VIN 3 7-246 viz 3}-vE3 21 


12° Vectors 29 


39. wen ase7 9. vais al var 

vewi1 5454 5) vev[6-V} vi 3 7-8 
a vie B vis 

vin 6 vor every 


44. Let y be the vector denoted by (a,,2,-...,) in conventional mathematical nota- 
tion. By translating the APL expression 


AVEM+¢Ve nv 


to conventional notation show that AVE is the average of the components of ¥. 

Let v and w be the vectors denoted by (44g. +54) (Ays Ay +=»Aq) itt conven- 
tional notation. By translating to conventional mathematical notation, show that the fol- 
lowing expressions are true in the sense that the result contains no zeros, 


45. CoV.Wo= Cov) sow 46. Vet vW 
(Assume (0¥) <0) 
47. (CV) +e Wee Ww 48. VeVio pV) 


49. The result of the monadic function » is always a vector, Using this tne 
for a vector v? A scalar V2 


50. Show that if the identity of exercise 47 is to hold for all vectors y, then 06+ / 0. 


is vey 


‘Compute the following expressions and check your answers at a terminal if one is 
lable. 

Sl rz ae $2. wre 

53. (147)=xy 417 34, 

5S. 6281-7 56, 7 

57. 9/222 38. | 


59. 0/2 1.02 


60, The following table gives the production of steel from 1946 10 1956. 


Year | 1946 1947 1948 1949 19501951 19521953 19841955 1956 


Tons of steel | 666 849 886 780 %8 1952 932 1116 883 1170 1152 


(a) Fit a least-squares straight line to these data, writing steel production as a 
function of time. 


(b) Using your equation from part (a), solve for time as a function of steel produc 
tion. 


(c) Compute the least-squares straight line, giving time as a function of steel pro- 
duction. Is this the equation obtained in part (b)? 


G1. Let (xy, ¥,)s (fa. 9) =< Oy 3q) Be given and set 


Sy - b= ax)? 


30 Vectors, Matrices, and APL 


Solve the equations 
és 
$9 or 
ba 


is 
ob 


apa + (Ej 


0 or (Zxjatnb==Zy, 


to find the values of a and 6 that minimize s, 

The following table summarizes information from eight gasoline credit card receipts. 
(The car, a 1976 Dodge Aspen station wagon, had its engine tuned just before the first 
gasoline purchase recorded here.) 


13° Matrices 31 
” 
60 120 7200 3600 
61 120 «7320 «3721 
62 135 «837038. 
62 135 «8505-3969 
64 130 8320 4096 
135° 8775-4225 
150 9900 4356 
140 9520 4624 
170 11730 4761 
145 10180 4900 
160 5041 
160 5184 
74 160 5476 
75 160 12000 5625 


Sain T 
reading 39922 | anias_|aas27_ | 40s26_| 0762 | 40937 | st0e7 | 41289 
Date | 12/20/78 | 1226/78 | 12/28/98 | 20/78 | 379 [1/1179 | 1723/79 | 1/3079 
Gallons | | 

purchased | 14 us | 3 | a | 49 | ns | tas 

Cost in | 

dollars | 1090 | 1136 | tot | 104s | 473 | tose 


Define the following vectors 


M = miles = first row of table 

T = time = second row of table with the dates rewritten as days from some arbitrary 
point in time, Say, day 1 is 12/1/78, so that 12/20/78 becomes 20, 12/26/78 
becomes 26, and so on 

G = gallons = third row of table 


© = cost = fourth row of table, 


Use these vectors to compute the following quantities: 


62. The price per gallon, in cents, for each of the eight purchases 
63. The average price per gallon, in cents, for the time covered by the table 
64. The times, in days, between purchases, 
65. ‘The distances traveled, in miles, between purchases, 
66, ‘The average distunce traveled per day between each pair of purchases, 
67. Compute the gasoline miteage, in miles per gallon: 

(4) Between purchases 

(b) Overall 


68. Lety be gasoline mileage and x be miles per day traveled. Fit a least-squares straight 
line to the data from exercises 66 and 67a to estimate yas a function of x. Plot the line and 
data on the same graph, 


1.3 Matrices 


A matrix is « rectangular array or “table” of numbers. An example of a matrix is 
the body of Table 1.2 of Section 1.2, which we will denote by m (the row of totals 
has been omitted). 


78 175 13300 5778 


This matrix has 15 rows and 4 columns. Any entry in the matrix is uniquely 
determined by giving its row and column. For example, the element on the ninth 
row and third column is 11730, the entry in the first row and first column is 60. 
Matrices are useful whenever one has a collection of numbers that can be 
naturally arranged in a row-column format. 
For example, consider the system of linear equations 


3x — Ty + 122 = 
8x — 9y =7 
2x+ y—16z=-1 


‘The coefficients of the unknowns form a matrix: 


3-7 12 
8-9 #0 
2 1 -16 


which we will use to solve such systems. 

In APL notation the entry in the ninth row and third column of a matrix m is 
denoted by m(9;3}, the entry in the first row and first column is (1:11. 
and so on. Using the matrix m defined above, for example, we have 


M953) 
11730 

MUNst) 
60 

M74) 


4356 


32 Vectors, Matrices, and APL 


The individual rows and columns of a matrix are vectors. Often we shall 
regard a matrix as a convenient way of manipulating a set of vectors. In APL 
notation, we refer to a row vector by omitting the column reference, and we refer 
to a column by omitting the row reference 


wi 
0 120 7200 s40u 

M(3) 
7200 7320 8370 8505 8320 8775 9900 6520 11730 10150 11360 11520 11 


10 12000 13300 
This notation is also used to replace entries of a matrix, 


M{10.3]-0 
u 

60 120 «7200 ©3600 

120 7320 3721 

135 8370 3844 

135 8505 3989 


64 130 8320 4096 
65 13587754225 
66 150 9900 4386 
68 «1409520 4624 
69 = 170«11730 4761 
70145 0 4900 
7) 160 11960 Soay 
72 160 11820 S184 


74 160 11840 5476 
75 160 12000 625 
76 178 «13300 8776 


been changed from 10150 to 0. 


MU12:Je4 3.24 


™ 
60 120 7200 
61 120 7320 
62 135 8370 
63 135 8505 
64 130° 8320 
65 135 8775 
co 150 9900 
68 140 9520 
69 17011730 
70 145, ° 
nm 160 11360 
4 3 2 ' 
74 160 11840 © 5476 


13° Matrices 33 


75 160 12000 © 5625 
78 175 133005776 


In this case the vector wj12;1 has been replaced by the vector 4 3 2 1. 


» FOR MATRICES 


The monadic function » is called size. The action of y on vectors and scalars has 
‘been discussed in Section 1.2. If v is a vector, then pv gives the number of 
components of y. If a is a scalar, then pa is the empty vector (.0). In both 
cases the result of » is a vector, The vector »x has one component if x is a vector 
and zero components of x is a scalar. 

Ifmis a matrix, then 9 is a vector with two components. The first component 
is the number of rows of m and the second component is the number of 
columns of m. 


The symbol » also represents a dyadic function called reshape. The dyadic » 


is used to define matrices. Here is how it works. 


32p137946 

13 

79 

46 
221397946 

13 

79 


The dyadic function » takes a vector left argument and any variable right 
argument. It reshapes the right argument to the shape specified by the left argu- 
ment. 

The dyadic » uses as many data from the right-hand entry as are needed. If 
there are not enough data, it returns to the start and reuses the data 


33apt 234 


34 Vectors, Matrices, and APL 


92125 


3 3p5 


5 


Ifthe left argument is a scalar, itis taken to be a single component vector. In 
this case the result is a vector. 


ys 
555555 


To enter a matrix into a machine, we first enter the data as a vector and then 


reshape the vector. 


EXaMpLe 1.30. Enter the matrix 


from the terminal. 


Solution 
x1 27963815 
Keb 4px 
x 
Tinie te ie 
isn ie os td 
1204 6 14 
a ae ee 
ee ve 5 
Examere 1.31 What ism if 


Ss 
eu wb 


2161447931824 


Mor 3 
M2 30M 


Reonwn 


5 


14 


7 


2 


13° Matrices 


35 


Solution The second line reshapes y to a 2-by-3 matrix. This requires six entries. 
After using the five numbers 1, 3, —5, 7, 2, we return to the beginning. Thus the | 


is used twice. m is 


Exampce 1.32) What is m if 
Me2 397 12-14-9868 
MUL:3}-0 
MU:21-6 5 


Solution ‘The first line defines # to be the 2-by-3 matrix, 


The last line replaces the second column of # by the vector ~8 


finally 


s the 1:3 entry of m by zero, Thus 


5. Thus Mis 


The APL indexing scheme for matrices is somewhat more elaborate than has 
been indicated so far. Suppose, for example, that m is the matrix 


‘An expression such 
intersection of rows 2 and 4 with columns 1 and 3 


4-6 8 12 
rw2> © 3 O71 

11 4 2 
rw4> @ 4 @ 1 


column 1 column 3 


S M{2 4.1 3) specifies the submatrix of m at the 


36 Vectors, Matrices, and APL 


Hence my2 4:14) is the matrix 


6 0 
63 


In fact, an expression such as mi1;}, which specifies the first row of m, is 
just an abbreviation for mi1;.w}, where w is the number of columns of u. 
‘Similarly, mi;6} means m(.x;6}, where « is the number of rows of m. 
(wc1i7 isa Vector rather than a I-by-n matrix, however. This somewhat subtle 
point is developed further in Exercise 16.) 

‘The next example is important for later use. 


EXAMPLE 1.33. Assume that # has at least four rows, What is the effect of the 
expressio 


M(2 4; JeMt4 2:1 


Solution mia 2;) is an abbreviation for mi4 2;.w}, where w is the number of 
columns of m. Thus mcs 2;) is a 2-by-v matrix whose first row is the 
fourth row of w and whose second row is the second row of m. This matrix is to 
replace the submatrix of m denoted by m(2 4) —that is, the submatrix 
consisting of the second and fourth rows of w. The end effect of the expression is 
thus to interchange the second and fourth rows of m. The expression 


MUG 2 1eMI2 451 
has precisely the same effect. 


Notice in passing that just as a vector may have size 0, (, 0) matrices may have 
0 rows or columns. 


Moro 090 
u 
+m is displayed as a blank line 
oM 
100 


SCALAR FUNCTIONS FOR MATRICES 


The primitive scalar functions (Appendix B) extend to matrices as well as to 
vectors (parallel processing). 
Monadic functions simply operate on each entry. 


M3 3019 
1 2 © 
45 6 
7 6 9 


13° Matrices 37 


_ 
2-3 
4s 
7-8-9 
™ 
2.720 7.390 © 20.100 


54.600 148.000 403.000 
1100.000 2980 000 8100 000 


1.000 0.500 0.333 
0.250 0.200 0 167 
0.143 0.125 0.111 


If two matrices are the same shape, then dyadic s 
corresponding entries. 


Jar functions operate on 


Nem{3 21.21 3) 


N 
Teraay 
5 4 6 
aca Ae 
Nom 
756 
roy 
5-7-6 
nM 


8.000 3.500 3 000 
1.250 0.800 1.000 
0.286 0.125 0 333 


ent 
8.000 49 000 729 000 
725.000 1020.000 46700 000 
128.000 1.000 19700 000 
Nat 

yay ia 

1) a 0: 

yoy a 


If A. @ are variables (scalars, vectors, or matrices) and « is a scalar dyadic 
function, then Aaa is defined if (»A)=»8. 

However, Aaa is also defined if one of 4 or @ is a scalar or vector of size 1 
In this case, the scalar or vector of size 1 is first reshaped to match the other 
variable. Then the computation is carried out. 


38 Vectors, Matrices, and APL 


aM 
a5 9 
12 15 18 
21 24 27 
Ms 
2-10 
1203 
156 
Vets The dyadic » is used here to reshape the scalar 3 to a 
a vector with | component. The function Ravel, defined 
ov below, could also be used, 
1 
vu 
> 6 8 
1215 18 
21 24 a7 
my 
2-1 0 
12 
45 


Two operations on matrices are of special importance for linear algebra: addition 
and scalar multiplication, If a is a scalar and A is a matrix, we will abbreviate 
«A to aA or da. We give formal definitions of these two operations. 


Derinrmion 1,3 Let and B be matrices of the same shape. The sum A + B of A 
and B is defined by 


(A + BYlisj] = Al) + BLA 
for all valid indices 4, j. 


Derinition 1.4 Let A be a matrix and @ a scalar. The product aA (or da) is 
defined by 


(oA: j 


Ali: j] 


for all valid indices é, j 


RAVEL 


Ravel is a monadic function denoted by the comma “.”. For any variable ¥ — 
scalar, vector or matrix — .x is a vector. 
For example . a is the vector of si 


1, whose single component is 3. 


13 Matrices 39 


sve3 
ov 
If X is a vector, then .x is simply X, If X is a matrix, then . x strings the 


components out into a vector. In this case Ravel reverses the effect of the reshape 
function. 


M2 3ph 6 3 0 2 4 


CATENATION FOR MATRIC! 


The catenation function permits matrix arguments. The form we will find most 
useful is the simplest. If P and Q have the same number of rows, then P, Q is the 
matrix obtained by catenating corresponding rows, If P is m by n and Q is m by 
r, then P, Q is m by (n +r). 


ES 


+P? B06 
eS 
456 

1-2 3p 60s 
oe) 0) 
10 41 42 

P.O 


a8 tO 44 12 


The columns may also be catenated. We accomplish this by indexing the 
symbol “.”, The expression “.(1)" means catenate along the first or row 
index. The “e.o” above is actually an abbreviation of “p,(2}0", which 
means catenate along the second or column index. 

‘Asa general rule in APL, if the index is omitted, the last index (the column 


index for matrices) is the one that is affected, 


P1110 
aan es, 
Tee ta 

chet 


40 Vectors, Matrices, and APL 


P.{210 
a a 
a 5) @) 40" ay 2 
a 
23 
va 
a3 
oP. (10 
as 
oP. 12)0 
26 


P and Q need not both be matrices. One may be a scalar (which will then be 
reshaped first) or a vector of the proper size, 


REDUCTION 


The summation ““+/” operates on matrices in a fashion similar to catenation, 
One can write “+1 1)" to sum along the row index (that is, to sum vertically’) 
or “+/(2)” to sum along the column index (that is, to sum horizontally). 
“s/121” is assumed if “+/” is written. In this case, however, there is an 
alternate notation for “+/(11." The symbol “+.” (plus, +, backspace, -) 
means “sum along the first index.” The result is a vector. 


All the remarks above apply to general reductions 
dyadic function (Appendix B). 


P, where o is a scalar 


36 Pick the maximum from each row, 


Lak ie Pick the minimum from each column. 


EXAMPLE 1.34 Let be the matrix 


60 120 7200 3600 
6 120 7320 © 3721 
62 135° 8370 3844 
63 195 8508 3969. 


13° Matrices 41 


oa 130 8320 ©4096 
65 195 87754225 
65 150 9900 4356 
68 140 9520 4624 
69 170 117304761 
70 145 10150 4900 
” 160 11360 5041 
72 160 11520 5184 
14 160 11840 © 5476 
75 180 12000 5625 
76 175 13300 ©5776 


This is the body of Table 1.2 without the last row of column totals. Write an APL 
expression to redefine M to be Table 1.2 with the last row included. 


Solution 
*MeM. (1M 
60 120 7200 3600 
6 120 7320 3721 
62 135 8370 saa 
63 135 8505 3969 
64 130 8320 4096 
65 139 8775 4225 
66 150 9900 4356 
68 140 9520 4624 
69 170 11730 476) 
70 145 10150 4900 
n 160 11360 5041 
72 160 11520 5184 
7 160 11840 5476 
78 160 12000 5625 
76 175 13300 8778 

1016 2195 149810 69198 


STANDARD SCORES (Z-SCORES) 


Statistical measurements are often made in quite arbitrary units, so that the raw 
data from different experiments are difficult to compare, To give a simple exam- 
ple. if the heights of the individuals in a population are measured by one experi- 
menter m English units (feet, inches) and the same population is measured by 
another experimenter in metric units, then, although the raw numbers are quite 
different, the distribution of the two sets of data is in a sense the same. To bring 
this out, the first step in many statistical analyses is to transform the raw data into 
standard scores. 


42. Vectors, Matrices, and APL 13° Matrices 43 


To do this, one uses the formula 2.2670 ~1.3930 
3.2670 13.6700 
4.2670 13.6700 
6.2670 13.6700 
7.2670 12.6700 
Here x is a measurement, X is the mean of the measurements, and o is the stan- pCsarnun saez6e 
dard derivation of the measurements. 
‘As an illustration of data manipulation using matrices, we will turn a matrix Notice that the expression (4) M reshapes M to the shape of 4, 
of raw data into a matrix of standard scores. 
Let x be the vector of measured heights from Table 1.1 and ¥ the vector of anya 
measured weights. Let 4 be a matrix with two columns: V and ¥. @7,750 146.300 
67.730 146.300 
A 67.730 146.300 
60 120 87.730 146.300 
61 120 67.730 145.300 
62 135 87.730 146 300 
63 198 87.730 148 300 
64 130 67.730 146.300 
65 135 67.730 146.300 
66 150 67.730 146.300 
68 140 67.730 146.300 
68 170 67.730 146.300 
70 145 67.730 146.300 
7 160 67.730 146.300 
72 160 67.730 146.300 
74 160 
75 180 Now the standard deviation o of a set of numbers xy.-.-,, With mean 
76 175 ¥ =(Zx,)/n is detined by 
First compute the x and ¥ means. ‘ 
Ps 
Mel t/Ayo18 
er 7 146.8 ‘Thus the standard deviations are 
Next subtract the means from the corresponding raw scores and call the result B. #SDn((4/B+2)-15)+-2 
5039 16 78 
1B-A-( 0A) eM 
7.7930 -26.3300 and the standard scores are 
6.7330 -26.3200 
5.7330 “11.3300 +2-8-(pB) 95D 
4.7330 -11, 3300 1.5350 -1,5690 
3.7330 16 3300 1.3360 -1.5690 
ise alesis es 
0.2667 6.3330 pete eter 
1.2670 23.6700 aan Sater 
0.5424 -0 6754 


44° Vectors, Matrices, and APL 


0.3440 0.2185 
0.0529 -0. 3774 
0.2514 1 4100 
0.4498 0 0794 
0.6482 0 8145 
0.8467 0.8145 
1.2440 0.8145 
1.4420 0 8145 
1.6400 1 7080 


THE OUTER PRODUCT 


The outer product is an APL function used to create matrices whose entries are 
given by simple formulas. It is the dyadic operation denoted by the pair of sym- 
bols “+.” (often referred to as “jot-dot”). The outer product is used in conjunction 
with a scalar dyadic function ». The form in which we will use it is 


Movs aw 
where v and w are vectors. 

If-v has m components and w has n components, then w is an m-by-n matrix 
(ies, 9M is (ov), »¥) whose i, 7 component is given by the formula 


MUI J=VE 1 }oWEd) 


EXAMPLE 1.35. (Kronecker product of two vectors) What is the result of the 
expressio 


4-73 ase 1 


Solution This is of the form v+ ow with v the vector 4 -7 3 2, w the vector 
21 2, and « the function x. v is size 4 and w is size 3. Thus the result is 4 
by 3. The 1,4 component of the result is v(/}x#(4}. Thus the result is 


42 4x1 4x2 
7x2 71 7x2 
3x2 3x1 3x2 
2x2 2x1 2x1 

or 
a 4 © 

ee ak) 
ty a 
(Si ee es 


13° Matrices 45 


Derinition 1.5 If v, w are vectors, the Kronecker product of v and w is the 
matrix v= 


We shall have occasion to use the Kronecker product in subsequent sections, 


EXAMPLE 1.36 (Hilbert matrix) The n-by-n Hilbert matrix is the matrix 


1 3 3 Wa 
4 $ F 1m + 1) 
4 4 3 In +2) 


In Win+1) In +2)... 1/Qn— 1) 


write an APL expression to create a S-by-5 Hilbert matrix. 


Solution The i; j entry of the matrix is 1/(¢ + / — 1), and the indices / and / both 
run from | to 5, Firsi set up vectors of indices 


fede 


Then a S-by-5 matrix H with entries H[é; f] = 1/(i +/ — 1) is just 


tosis .9-1 


1.0000 0.5000 0 3333 0.2500 0.2000 
0.5000 0.3323 0.2500 0.2000 0.1667 
0.3333 0 2500 0.2000 0.1667 0.1429 
0.2500 0.2000 0.1667 0.1429 0 1250 
0.2000 9 1667 0.1429 0.1250 0.1111 


If A is an m-by-n matrix, then +/+/A is Sp AMfi;f] and +42 is 
» SPL Ali; /}: This observation can be used in conjunction with the outer prod- 
uct {o evaluate such double sums. 


Exampce 1.37 Evaluate 


Solution Af A is a 20-by-30 matrix with Ali;/] 


21 +/% then +/+/a is the 


sum sought. First, set up vectors of indices 
dn120* 

Jo130 

+ Both these sums are. of course, equal {0 +, aswell, But +/+ emphasizes the double-sum nature 


of the computation, 


46 Vectors, Matrices, and APL 


Then the sum sought is 


eeer(aet ye rsed 
4337100 . 


ExampLe 1.38 Evaluate 


0 
SVE +M-/) 
Ait 


Solution The easiest way to do this is to set up two 20-by-10 matrices A and B 
with Ali; /] = +/ and Bij] =i —J and compute +/+/ax8. 


10120 
Jer 


rer (le td pete nd 


21000 . 
EXAMPLE 1.39 Evaluate 
20 
> 
2 
Solution 
fmsese0 
te eee) 
100 . 


EQUALITY CONVENTION 


There is a conflict between the use of the equality sign in conventional 
mathematical discourse and the use of the dyadic APL function =". If A and B 
are matrices, then in conventional mathematical discourse “A = B” means that A 
and B are the same matrix — that is, have the same entries. The APL expression 
“as8™ results in a matrix of zeros and ones. A one means that the corresponding 
entries of 4 and @ are equal; and a zero indicates that the corresponding entries 
of 4 and 8 are not equal. The conventional expression “A = B” corresponds to 
the case where the APL result “a8” contains no zero entries. 

To reconcile the two, we use the following convention: An APL expression, 
such as 4-8, which results in a matrix of zeros and ones is assumed to contain no 
zeros. This convention is adopted to allow us to write phrases such as “axs is 
defined whenever (»A)=y8.” Exceptions to the convention, such as “What is 
the result of the expression 3=3%-1 a 12” should be clear from context. 


13° Matrices 47 


Exampte 1.40 (Identity matrix) The n-by-n identity matrix 1D has [D[i; j] = 0 


when i # j and /DJi; i] = 1. Write an APL expression to create the N-by-N iden- 
tity matrix. 


Solution ‘The APL expression /..J results in a} when / and y are equal and a 
© when they are not. Thus the identity matrix can be represented as 


tat 422 
2-1 262 
Bet 3-2 


Hence a solution is 
1D-(1N)* aN 
An entirely different solution is 


IDe(N.N)e1.NeO 


EXERCISES 1.3 


In exercises | through 15a matrix Mis defined. What is M? Answers should be worked out 
by hand and checked at a terminal if one is available, 


Mod 2pid 2 Med Bpid 3. Met apd 
4 Med 2904 S. Mea tpt 6 MES Ay 
MI3;21-6 MLZ: 1-0 MLA Ie4p0 
MU.3}u5p 1 
7. mS 4pi5 BMS 5016 9 Mes Sp 16 
MU:3)=-5p 4 MIS JeM{.3) MLS: JeM{ 96-15) 
M{3; Jn4 00 
10. ms Sei8 Mh Med 30(301), (302) .303 
MUS; 1eMI6—15 3) 
12, M3 36(391). (302) .303 13, M3 Spo 
MUL 3: )eMI3 1.) MU 351 3J-2 299 7.3.1 
14. m3 3919 1S. Med 3p 19 
MU1 3:1 3}eME3 1:3 1) M(B; JeME1;30 1) 


16. Letm be a matrix and let 1. J be arrays of valid indices for M, The rule determining 
the shaper of M{ 1:3} i8 (oME 1-41) =(p1) nd —for example, (oM[1 251 2 3])=(n1 
2).#1 2.3 or 2by-3 —and, since the shape of a scalar is 10, pM{2:3] 18 (92). 93 oF 
(40),0 or «0. Thus Mj2.3} is a scalar. 

Use this rule to determine the shape of the arrays defined below, The answers may be 
checked at a terminal, Assume 10 100M. 


(a) pt 251 24 (b) Mias1 2) (e) MEL 2287 


48 Vectors, Matrices, and APL. 


(a) Mtn 21 (e) Mtatt ay () Mt.3s) 
(2) MU.6) (b) Mit @ mp.a:.4 
() Mt.3i4) (ky Mia sal () Mp2 2e04-91 


(m) MU1:2 204d 


In exercises 17 through 30 a matrix M is defined, What is M7? 


17, aed ania 18. Aq? Spd 19, An2 Spe4 
BH? 32 Bra(2 153 21) Beal? 1.3 2 1) 
MoavB MAB MAB 

20. Ar? anid Ql An? Spa 22, Au? Spi 
Baz 1)9 21) BeA(2 1:3 2 1) BAL? 1:3 21) 
MBA Mara 

DB. Ae? Bpid 24. And 3p 04 25 
BeA(2 1:32 1) Beatz 1:3 21) 

MAB MeA, (218 MeAL(1IB 

26, An? apid 27 Aad Byrd 28. Ar? Sed 
M5 6A MAY Men UTA 

29. And Syed 30. An? 3pi4 
M2 tA M-0.(1}0.(A.11}0).0 


31, This exercise is based on Table 13. Measurements were taken from tWo groups 
consisting of boys aged 9 (0 11. The measurements were repeated after six months. One 
group, the coniraf, was a sample from the general population. The other group, the experi- 
‘mental, consisted of boys engaged in competitive swimming. 

‘The measurements were height (in inches), weight (in kilograms), and fitness (in liters 


Height Weight Fitness Height Weight Finess_| 
Pre Post | Pre Post | Pre Post| | Pre Post | Pre Post 
3300 5470 | 3159 3338] 128 133 | s0s0 51.0 | 25.50 16s 
5410 55:10 | 2727 2680| 166 149) | $675 7.60 | 35.40 182 
53.10 54.30 | 2955 3050] 128 137) | $650 57.60 | 33.60 193 
30 6190 | 4259 4425 | 197 223) | 5450 55.80 | 36.90 1.84 | 
| $520. 56.10 | 2985 3030 | 164 1.70 | | $425 55.20 | 28.60 189 
3630 3800 | 3477 3500 141 1.83) | $300 $370 | 30.40 210 
37.40 58:80 | 35.68 36.50) 143 1.61 | | $300 5830 | 29.10 16s 
$480 55:10 | 27.05 2725) 143 141 | | $525 $6.70 | 30.90 150 
5750 58.90 | 3614 3720) Let 155 | | 4950 50.70 | 24.10 150 
$9.00 61.50 | 39.55 39.10) 179 1.90 | | $6.00 $6.90 | 30.90 173 
$580 56.90 | 4173 4190) 174 179] | S480 $8.00 | 26.40 139 
5750 59.80 | 3068 3335| 133 188! | $875 60.70 | 3730 205 
$360 5470 | 32,73 3530| 153 165 | | 5120 5270 | 2680 176 
51.60 52.70 | 2600 2520] 148 154| | 57.00 S400 | 4730 51.10) 210 227 
3680 57.90 | 3159 3310| 136 144| | Sogo S140 | 2648 2830] 133 154 

Control = Experimental 


Sounce; M. Karpman, personal communication, 


13 Matrices 49 


‘of oxygen per minute). The “fitness” measure is oxygen consumption while walking a 
treadmill. 
(a) Store these tables as two matrices, CON and Exe. 


(b) “Fitness” is an absolute measure that does not take body size into account, Add 
two new columns to CoN and ExP consisting of fitness divided by weight expressed in 
milliliters /kilogram-minute, 


(©) Reduce con and ex? to standard scores. 


Write APL expressions to compute the double sums in exercises 32 through 40. Evaluate 
the expressions at a terminal if one is available. 


w 2 w 2 » © 
2 Sits 3. SYi-y MOS Si-/ 

é aa aa 

w > » 30 io 
NN ae gi) 36. SSH 3. SS (sin ix/6NcO8/x/6) 


ae 


w ww 2 


Si max (sin i,cosy) 39. gat oS Saath 


Let R be the rectangle in the xy plane bounded by the coordinate axes, the line 

and the line y=. Divide the interval (0,a) into nm subintervals 
Xp =0<x,<--+ <x, =a and divide the interval [0.4] into m subintervals 
My =0< 9, < + Sy, =H. Let Ay be the area of the rectangle: x, <x < xy, 
|-1 <¥ <p Then an approximation to the double integral fy F(x,.)) dA is the double 
sum 1S 9) Ay Wy = Xn 'b/m, then Ay, = ab/nm, 


For example, if a=b=1 and n 
Sip 4xy dA = Vis given by -J-(«25)-25 and 


5, then an approximation 10 


(444 (ante xs) =25°2 


50 Vectors, Matrices, and APL 


In exercises 41 through 45 use this approximation to estimate values of the given double 
integral using the given values of m and 7. 


al ffxsinyaa, tea 
Se ee 
4B. ieee Bale 
44, ii via Hyd, R te 
© ffmou 0S 

J. < 
46. osthenes 


id calculation show that 
Ve2=*40=(110)%. 1410 


‘vector of zeros and ones with V{i 
10. 

(b) Let N be a positive integer and define A by 4=(iN)*. |«N. Show that for 
any positive integer / less than or equal to N the number of zero components of the 
vector Al:i] is the number of distinct positive divisors of é 

(©) Using part (b), show that the vector V of zeros and ones defined by 


= 1 if and only if 7 is a prime number less 


2ee/Oe(WN)* [oN 


has V(/] = 1 if and only if fis a prime less than or equal (0 N. 
Note: To produce the primes themselves from the vector ¥ one may use the expres: 
sion Viv, where “/" denotes the compression function defined in Section 3.4 


CHAPTER TWO 


Matrix Algebra 


In this chapter we begin the study of linear algebra proper with the algebra of 
matrices. 

A multiplication for matrices is defined in Section 2.1, The set of matrices 
endowed with this multiplication and the addition defined in Chapter | (Defini- 
tion 1.3) form an algebraic system extending the familiar algebra of scalar quanti- 
ties. 

Using this new algebra, we may write systems of scalar equations as « single 
matrix equation and then proceed to solve the matrix equation in a manner 
analogous to that for solving a single scalar equation 

This matrix algebra is developed in Sections 2.1, 2.2, and 2.3, The other 
sections of this chapter, except for Section 2.4, contain common applications of 
matrix algebra. 

Sections 2.5* and 2.6* apply matrix algebra to the calculus of functions of 
several variables, These sections will be most useful to readers who have some 
acquaintance with partial differentiation. They are called multivariate calculus 
sections. 

Section 2.4 discusses the class of functions central to matrix algebra — affine 
and quadratic functions. The content of Section 2.4 is not necessary for Chapter 3 
but is needed for Chapter 4 and the multivariate calculus Sections 2.5* and 2.6*. 


2.1 Matrix Multiplication 


First a word about the differences between matrices and vectors. Matrices are 
doubly indexed arrays. If M is a matrix, then it takes fio indices to specify a 
component of M— for example, M[2; 4}. Vectors are singly indexed arrays. If Vis 
4a vector,then it takes one index to specify a component of V — for example, {3} 

Because of the way in which vectors are displayed, it is tempting to think of a 
vector as a matrix with but a single row. We will never do this. 

In matrix algebra, however, it is often very convenient to blur the distinction 
between vectors and matrices with a single column, In formal situations and in 


si 


82 Matrix Algebra 


writing expressions where indexing is involved (for example, mi3;21-Vi41), 
it is necessary to carefully preserve the matrix-vector distinctions. In other situa- 
tions, however — especially when writing out the matrix-vector products defined 
below in Definition 2.1 — it is convenient to write vectors vertically and ignore the 
details involved in indexing components. The reason is that the vectors we are 
concerned with are often the columns of a given matrix. 

Vectors written in a vertical format will be referred to as column vectors. 

In conventional mathematical notation, matrices are usually set off typo- 
graphically by brackets, parentheses, or braces. For example, 


IE ] ( ) iF | 
34 34, 304 
Examples of this notation are: 


BE D3 2 
2 
6 
1 


1 36 
3/7 6}=|21 18 
0) 0 3 
MATRIX TIMES VECTOR 
Let A be a matrix with n columns. The columns of A are vectors. Let u, = Al:i] for 


(= Moses m Let v = (ay... 2444) be a vector with n components. The matrix- 
vector product Av is the vector 


w 


1Ur FG by + =~ +.0,0, 


Thats, Av is that linear combination of the columns of A with 
components of v. 


EXAMPLE 2.1 
1 al 2 I 3 2-14(-2)-3 4 
2 ofl-2 2/4+(-2]0]=| 2-24+(-2)-0}=| 4 
Ties -! 5} [2-(-1) +(-2)-5 -12 


Here we have written the vectors (2,2) and (—4,4,—12) as column 
Vectors. ow 


A more formal definition will be useful later for checking identities. 


Derinttion 2.1 Let A be a matrix and Va vector. Suppose that (»A){2]=a¥. 
Then AF is the vector with (na) (1}=0av and 


1 Matrix Multiplication $3 


ov 
(AV = Sli: KVR 
a 


for all valid indices i. 
SYSTEMS OF LINEAR EQUATIONS. 


Matrix-vector multiplication enables one to write a system of linear equations in 
compact form. Consider, for example, the system of linear equations 


3x +2y=7 
9x — Sy = 12 
6x — Ty =5 


‘These three scalar equations may be written as the single Vector equation 


re 
Hf 


More generally, given a system of m equations in the n unknowns xy... 


or, using matrix-vector multiplication, 


By 
9 -5 
6 -7 


Api X4 F AyisXy F Aqaky Hee # yyy 


The m scalar equations may be rewritten as the vector equation 


54 Matrix Algebra 


or 


or, finally, 


where Ali: /] = ay XU/) 
In this chapter and the next an algebra of matrices will be developed to solve 
such equations, 


APL NOTATION 


If A is a matrix, Va vector, and (vA)(2}=»¥, then the matrix-vector product is 
written in APL notation as 


Aro 


The construction “+ x" (read “sum of products”) is a particular example of the 
APL generalized inner product discussed later in this section. A formal definition 
will be useful later. 


Derinition 2.2. Let V, W be vectors with (ovysom. Then the scalar y+ xw 
is defined by the identity 


(Ye a) ee 8 


‘That is, ve «WW is, in conventional notation, 


ov 


> MAIL 
re 


Deriniion 2.3. Let A be a matrix, V and W vectors with (»¥)=(p4){2) 


and (W)=(»A)(1}. Then the vectors a+. «v and w+. xa are defined by the identi- 
ties 


(Ab XV) LiL=ALi dt xv 
(We XA) Lie ALA] 


for all valid indices i. 
Notice that since A[é;] and A[:i] are vectors, the right-hand sides of the identi- 
ties in Definition 2.3 are defined in Definition 2.2. Using the conventional nota- 


2.4 Matrix Multiplication 55 


tion from Definition 2.3 above, (A+ «v) {7} is 


ov 
S Ale AVI 
re 


the definition of matrix-vector multiplication given above. 
EXaMpLe 2.2. Perform the computation of Example 2.1 in APL. 


AnD 2p1320 15 


v2 2 
Was KV 
44 12 
vA 
a2 
ov 
2 
ow 
3 


Notice that V is a vector, not a 2-by-1 matrix, and Wis also a vector, 


MATRIX TIMES MATRIX 


If A and B are matrices and the number of columns of A equals the number of 
rows of B, then the matrix-matrix product AB is defined as follows. Each column 
of B is a vector. The ith column of AB is A times the ith column of B: 


(AB):i] = ABI: 


Exampte 23 
es 3 
A=| 2 0}. 1 
=1 5. 
13 =] 1 3 z 
sal (-)} 2} 4 1]0)=]-2 
=1 5. =] 6 
1 3]p3 1 6 
zolliles 2]+ 1/0] =]6 
eres = eal {a 


56 Matrix Algebra 


Thus, 
1 3)p-1 3 2 6 
2 i 1 [3 6| 8 
=1 5. 6 2 
The procedure of writing out the linear combination of the columns each time two 
matrices are multiplied is not practical. One works through the columns of B in 


order, calculating the corresponding columns of the product, skipping the inter- 
mediate step of writing out the linear combinations 


1 all il 1-(-) +31 1-343-1 
20 P id=]2-(-1) +0-1 2+34+0+1 
ies 


(-D(-I) 450) (“D3 4501 


26 
26 


APL Notation 


If A and B are matrices, the matrix product is again given in APL by the expres- 
sion a». xe. In this case the definition is by means of the identity 


(At xT) AP BLS 
The right-hand side of this equation is a matrix-vector product. 
EXAMPLE 2,4 Perform the computation of Example 2.3 in APL. 
Ad 2p1. 320-15 
Be2 201314 
Ar xB 
2.000 6 000 
2.000 6 000 
6.000 2.000 . 
The following formula is sometimes useful 
Proposition 2.1 Let A and B be matrices with AB defined: 
(A 42x BM /] = ALG] +. BLY) 
Alternately, 


ant 
(ABE A= > Als KIBIK: jl. 
a 


21 Matrix Multiplication $7 


Proof (A +.X B)li:] is the ith component of the vector (A +. B)|yj}. The ith 
component of V =(A +. B)fz/] is written V[i] = ((A +. B)ls/Dli}. This gives 
us the identity 


(A +.X BYlis j] = (A +.% BLD 


but (A +.x B)lyj] =A +.x Bly] by definition. The product A +.x Blyj] is the 
product of the matrix 4 and the vector Bl;j}, and hence by the definition of 
matrix-vector product 


(4 +. BLY = Ale) +.x BE). 
In summary, 
(A +.% BE = (A +. BMA 
= (A +.x BLA 
= Alii] +. Bly] . 
If A has m rows and n columns and B has p rows and q columns, then AB is 


not defined unless n = p.If'n = p, then AB = Cis defined and C has n rows and q 
columns: 


n ¢ q 
m[An[B =m [Cc 


Even if AB is defined, BA need not be (for example, 3 2pA and 2 5=»a), If 
both AB and BA are defined, they need not be of the same shape, as the next 
example shows. 


EXamPLe 2.5 


I ap 00 100 
AB =|0 oflo 0 1 000 


oot 


eras Ba 
BA=10 iJjo oJ=lo 1 . 
0 


A matrix is called square if the number of rows equals the number of columns. If 
A and B are square and AB is defined, then BA is defined. Further, AB and BA are 
both square and have the same shape. Still, 48 # BA in general. 


° 


W 


$8 Matrix Algebra 
ExampLe 2.6 
ee 
a=[ ib sl=[s 
=G sh al-Ls sl + 
In algebra the identity ab = ba is called the commutative law of multiplication. 


Matrix multiplication does not obey the commutative law. Matrix multiplication 
is noncommutative. 


Transpose 


Let A be a matrix. The transpose of A, written Sat in APL and AT otherwise, is the 
matrix whose rows are the columns of A. Thus, if A is m-by-n, then AT is n-by-m. 


EXampLe 2.7 


The main diagonal of « matrix A is the vector with components A[1; 1}, A[2; 2} 
AB: 3}... Alms nf 


The matrix A” is obtained by “flipping” A about its main diagonal — hence 
the APL notation o. Incidentally, the main diagonal of A is denoted in APL by 
118A 


48 is typed o-hackspace- 


2 Matrix Multiplication 59 


EXampLe 2.8 
+A 3 59125 
Tere sma) 16: 
6 7 8 9 10 
11012 13 14 15 
A 
Lewes Ah 
geen, Az 
38 413 
4 94 
5 10 15 
1168 
1743 . 


A formal definition of A? is useful in checking identities 
Derinirion 2.4. Let A be a matrix. The matrix 8 is defined by 


1. (pAV2 1] = 9 8A 
2. (GAD; j] = AL): i] for every pair of indices i: j, 


If A is a vector or scalar, then A = 8A 
The next proposition gives two useful identities. The first identity emphasizes 
the noncommutative nature of matrix multiplication 


Proposition 2.2. Let A, B be matrices with AB defined. Then 


(AB)T = BTAT 
(4 + BY’ = AT + BT 


Proof Let A be m-by-n and B n-by-q. Then AB is m-by-g, so that (AB)? is q-by-m. 
Since BT is g-by-n and AT is n-by-m, BTAT is defined and is q-by-m. Thus the 
shapes are correct, The equality of the individual components is most easily 
checked using APL notation. 

Since( 8 A)li; j] = AL: ihitfollowsthat( 8 A)fi 
Thus, 


= Asijand(8.A)lyj] = AL/:) 


(8A ¥.% Bij) =(A +-% BLA (definition of 8) 
= Alf] +.x Bhi] 
(SA)[;j] +.* (8B) (Proposition 2.1) 


= +/(BAN XS BYE) 


60 Matrix Algebra 


= +/(8B)fisx(8A)Ly] (these arrays are vectors) 
= (8B)fi:) +.x (SAY) 
= ((8B) +.x SANE 


for all indices ij. = 


A matrix is called symmetric if A = AT. The symmetric matrices form an 
important class of matrices that will be studied in some detail in later chapters, It 
follows from the above proposition that matrices of the form ATA or AAT are 
always symmetric (exercise 6). 


GENERALIZED INNER PRODUCT 


The “sum of products” operation “+,” used above is a specific example of a 
more general APL construction that we will sometimes use. The construction 
generalizes in two ways. 

First the “+” and “” may be replaced by any of the scalar dyadic functions 
from Appendix B, Thus, given vectors A and B, one could form the “sum of 
quotients” av.-8 (= +/A-8) or the product of differences ax.-a (= =/A-8). If 
cither A or B is a matrix, then the obvious changes in the identities used to define 
A +.X B give the appropriate definition: 

Second, the operation is always defined if one of the arguments is a scalar or 
single-component vector. The scalar is first reshaped to a vector of the proper size. 


EXAMPLE 2.9 (Sum of squares) One often wishes to compute the sum of squares 
of the components of a vector . This is just v+. +2 (recall that + is exponen- 
tiation), For example, 


1-1 S492 is 12 + (1)? + 3% or Ih . 


EXAMPLE 2.10 (Averages) One often wishes to compute the average of a set of 
numbers. Let the numbers be stored as the components of a vector ¥. Then the 
average is just vr pv. 

Suppose that v — 1 2 3. Then pv is 3 (the single-element vector) and 
V+ -9¥ is 1.2.46 =3, which is 


EXapLe 2.11 (Equality of vectors) In later chapters we will occasionally need 
an APL expression that will return a 1 if two vectors are equal and a 0 if they are 
unequal. If V and W are scalars, then the expression v-w will do. If V and Ware 
vectors, then the scalar dyadic function ~ operates component by component. 
Thus) 23-1 03is1 01. 


21° Matrix Multiplication 61 


To check the equality of the vectors V and W’ use vA. =. For example, 
1.2.32 = 10 Bis (1=1)A(240)A¢3=3), Which is 1A(0A1) which is 0 

In particular, the expression vA,-0 is 1 if V is the zero vector and is 0 
otherwise, 


EXERCISES 2.1 


1. Compute, by hand and by machine if available, the matrix-vector product w = Av. 
14 
a) (b) A=]0 Ofv =(4,-1) 
oe) 
3-1 
Peer cn) 
1 i 0 
@) A=] 4 1 -jr=(.1.0) 
1600 SOR — 1 
@ 4=[} jle=@m () a4=tl 3 4 2 
v=(. Lhd) 


(Omit machine computation for this problem.) 


(hy) 4 =[3)v =(4) 


r 
(ge) 4=]afe 


2 Compute, by hand and by machine if available, the matrix-matrix product 
indicated, if possible. 


@ Laie i] © Gills 
cee eee 
i [: 1 i au [ 5 ill 


fers ‘ a 2 E ey cel 


1 =3] 1-3 
setae} -1 2 


3 .f7 qe [}° 6 1 
ray Bi 4 iw |s 
* 5 Oa]; fa 


62 Matrix Algebra 


1 i) 03 4-l 9 -5 5 
GW aBwnee¢=[5 “| 9 <1 0 -1 0 =3 1 = 


= 4 1 4 4-! 20 ' 1 
ale 


yes oat Wr a 
[i || 4 
im Ty offs 


o (5 alti ol 
© (oll ‘) Mot fr olds alli od 
{) 
obit] » GM! 
3 taa=[} eh e=[} ~ 4). show tnat AB = BA in this case 
4. ved =[f Oana let B be any 2-by-2 matrix. Show that AB = BA. 


5. Let A be a 2-by-2 matrix, Suppose that AB = BA for every 2-by-2 matrix B. 
‘Show that there is some number a such that 


a=[o al 


6. Use Proposition 2.2 to check that any matrix of the form AAT or ATA is symmetric 
7. Rewrite the following systems of equations as a single matrix equation, 


(a) x +4y=2 (b) 4y=3 (©) 3x +4y— 4x =7 
Tx —6y=1 2+ y=0 m+ x=) 
=0 
(4), =3x) 4 y= Ixy) x = 4 447 
Va = AN, + Xp y= 641 
(8) 


8. Matrices may have no rows or columns. That is, a matrix might be O-by-4 or 
S-by-0. To create such a matrix in APL use A-0 4»0 oF A-5 0.0. 


(a) If A is O-by-10 and B is 10-by-0, then C = AB should be O-by-0. Check 
this on a machine if one is available, 

(b) If A is n-by-0 and ¥ is a vector without components (o=p¥), then 
W-A+ Visa vector with n components. Use the identity (A+. *v) (| J=AT1.1+-x¥. 
exercise 50 of Exercises 1.2, and the fact that (V+. <w)=+/vaw if V and W are 
vectors to show that the components of IV are zeros. This exercise shows that a linear 
combination of an empty set of vectors gives a zero vector. 


(©) Let A be mby-0 and B be O-by-m. Show that 4B is an n-by-m matrix of 
zeros. (Hint: See part (b).] Verify this on a machine if one is available. 


22 Inverse Matrices 63 


9. What is the result of the APL computations? 


@) 123 +02 (b) 123 —x2 
() 1294.02 (@) 149405 
(©) 2x1 23 () viz)ercasn7 
@ viz60rsa7 (h) 1230-155 
@ 123Ver33 G) 12908198 
(i) 123A<093 WM 129 A<133 
(m)123Ve1a3 (n) Ad 2016 
AN.=3 4 
(0) And 2066 (p) And 291300 
AV eS 4 +A. =0 
(q) And 201 300 () And 291 3.00 
+/AV. #0 +/0/. mA 
() Arm 291 300 
+/0V #A 


10. Write APL expressions to: 
(a) Count the number of zero rows in the matrix A. 
(b) Count the number of nonzero rows in the matrix A. 
(c) Count the number of zero columns in the matrix A. 
(4) Count the number of nonzero columns in the matrix A. 


11. Use Proposition 2.1 and Definition 2.4 to show that if ATA is a matrix of zeros, 
then A is a matrix of zeros. 


2.2. Inverse Matrices 


There is an algebra of matrices similar in many respects to the familiar algebra of 
real numbers. There are, however, some fundamental differences. One of these 
differences has already been encountered. Given matrices A and B, it is nor true 
that AB = BA. Matrix algebra is noncommutative. 

We shall see shortly that cancellation also fails. If AB = AC, we cannot “can- 
cel the 4's” and deduce that B = C. In general B 4 C. 

One important property of scalar multiplication that does hold for matrix 
multiplication is the associative law. 


Proposition 2.3. Let A, B, C be matrices with AB and BC defined. Then 
A(BC) = (AB)C. 


Proof Let. be m-by-n. Since AB is defined, B must have n rows, Say B is n-by-p. 
Since BC is defined, C must have p rows. Say C is p-by-q. 

Then BC is n-by-q and A(BC) is m-by-q. On the other hand, AB is m-by-p. 
and so (AB)C is m-by-q. Thus A(BC) and (AB)C have the same shape. Next we 
show that corresponding entries are equal. 


64 Matrix Algebra 


Consider the i; / entry of A(BC) — that is. (A(BC))[é j}- 


(ABO)E:j) = S Ale: AMBOMA: by Proposition 2.1 
" » 
= Sar a(> yn NCU 1) by Proposition 2.1 
= i 
oa 
SoS al MBI NCI 
i 
Similarly, 


(AB)CYE | = SABE: NCUA 
1 


bye 
=S (S Ali: hIBUh; nyews/ 
i Wat 


fis AIBA; CUE] 


EXAMPLE 2,12 [occasionally happens that a calculation is made easier by invok- 
ing the associative law. As an extreme example consider the following calculation. 
Here 4 is 4 by 1, B is 1 by 5, and Cis 5 by 1 


yt! 3 -2 5 7 1 1 3 -2 $ 7 1 

-1 <i) |= =3 2 <5 —7i(-1 
2 2|= 2 6-4 W 2 

3. -2 3 9 -6 1S 2j}-2 

1 1 


-18 
—27. 


Wl 3-25 nf 1 11-9) -9 
=! -1 =1 9 
= 2 alee =|-18 . 
3 -2 3 -27 
i 


The trick illustrated in Example 2.12 is important in machine implementation 
of some important linear algebra algorithms. If all three vectors were of length n, 


2.2 Inverse Matrices 65 


for example, then the first calculation involves performing 2n? multiplications and 
keeping track of up to n® + numbers, whereas the second calculation involves 
2n multiplications and keeping track of 3n numbers. For = 100, say, this is a 
significant difference. 

Next we take up the problem of the cancellation law. We begin with an 
example to show that it does not work, We need three matrices A, B, C such that 
AB = AC but BAC. 


Exampte 2.13 


\ 53 
2-1 1 eet o1 
Saal et ae 3:41 
Ones -10 
ae | 
Te th lee aa 0 
ae Ls 5 A alk 2 1 
oO -1 
33 
ea iirc 
Vex Gea Sl] Sales 
=1 0 


BuBbysC 8 


In the system of real numbers cancellation is related to division. Cancellation 
for the real numbers can be justified by the following argument 

‘Suppose that ab = ac and a # 0(if'a = 0, we cannot cancel). Since a # 0, we 
may divide both sides of the equation by a to get b = c. Let us look at this division 
in another way. Since a # 0, we can form 1/a = a. Now multiply both sides of 
the equation by a~': 


Lab) = Lae 
lab) = (ae) 


Now use the associative law of multiplication 


Since aa = | this becomes 


Since 1-6 = and | +c =c, we have the result. 


66 © Matrix Algebra 


Since cancellation does not hold for matrices, this argument must fail for 
matrices. By Proposition 2.3 the associative law holds, and so the problem is the 
equation ata = |. There are two problems here. What is “a!” and what is “I 
for matrices? The second question is easy to answer. 


Derinition 2.5. The identity matrix of order nis the n-by-n matrix with ones on 
the main diagonal and zeros elsewhere [i.e., (1) *, «10k 
The identity matrices of orders 1, 2, 3 are 


wo fay fore 


oo. 


The zero-by-zero matrix (0 0 » 0) isan identity matrix 


Proposirion 2.4 Let A be an m-by-n matrix 


1. If is the m-by-m identity matrix, then JA = 4 
2, If Lis the n-by-n identity matrix, then AJ = A. 


Proof By Proposition 2.1, 
CAME A= SMe hte A 
or 


Now Ii; h] is Lif =k and 0 if / 4 k, So the only nonzero term of this sum is 
Nis HAUG Jf] = AUG /). Similarly, Al = 


The question of “a~! for matrices is considerably more comphieated. 


Derintrions 2.6 
(i) The matrix L isu left inverse for A if LA = J, an identity matrix. 


(i) The matri 


(ii) The matrix B is an mverse for A if itis both a left inverse and a right inver 
for A. In this case A is called invertible or nonsingular and B is denoted by A 


R isa right inverse for A if AR = I, an identity matrix 


ExampLe 2.14 Consider the matrices of Example 2.13. We have AB = /, so that 
Aisa left inverse of B and B is a right inverse of A. 4 is also a left inverse of C, 
and C is also a right inverse of A. 

This shows that a matrix may have more than one right inverse. Taking 
transposes and invoking Proposition 2.2, we have J = 17 = (AB)? = BTAT Simi- 
larly J = CTT. And so a matrix may have more than one left inverse. 


22. Inverse Matrices 67 


Example 2.14 shows that a matrix may have several left or right inverses but 
they must all have the same shape. Suppose that LA = J and A is m-by-n. Then 
has n columns and L has m columns. Since / is square, L has n rows, Thus if A 
is m-by-n, then L is n-by-m. Similarly a right inverse would be n-by-m also, 

‘The next example shows that a matrix may have no inverses at all. 


EXAMPLE 2.15 Let 


10 
4=[5 ol 

Suppose that A has a left inverse /.. Then L must be 2 by 2 also. Say 
ab 
nS [ d 


Since LA = I, 


a ale Hee 0 
dio 0} lo 4 
fe 1. HI 

« O} lot 


Since 0 # 1, this is impossible, Thus A does not have a left inverse. A similar 
argument shows that A does not have a right inverse. 


If a matrix is invertible, then there is only one inverse. For suppose that 
LA =I, and AR = I, Where Jy, J» are identity matrices of orders n and m, Then 
Ais m-by-n, and L, R are both n-by-m. Multiply through the first equation by R 
on the right 


(LAR =1,R 
(LA)R = R by Proposition 2.4 
L(AR)=R by Proposition 2.3 
Li, =R 
L=R by Proposition 2.4 


If L is another left inverse for A, then the argument shows L = R also. Hence 
L = L. Similarly all right inverses are equal. Hence all left and right inverses are 
equal. 

‘This justifies using the symbol 4~! when A is invertible, since there is only one 


inverse. 
The following result will be proved in Chapter 3. 


68 Matrix Algebra 


Proposition 2.5 Let A be a matrix. 


1, If A is invertible, then A is square. 
2. If A is square and A has a left inverse, then 4 is invertible. 
3. If Ais square and A has a right inverse, then A is invertible. 


‘The import of this theorem is that if 4 is square and BA = J, then B = A~* 
and so AB = / automatically. Conversely if AB = J, then B = A~! and AB =1 
automatically. Thus only one of the conditions, AB =/ or BA = I, need be 
checked if B is a candidate for A~ 

Returning to the question of cancellation, if AB = AC and A has a left inverse 
L, then B = C. The argument is as follows: 


AB = AC 


by Proposition 2.3 


by Proposition 2.4 


Solving Linear Equations 


We saw in Section 2.1 that any system of linear equations could be rewritten as a 
single matrix equation of the form 


AX = 


Now if £ is any left inverse of A, then 


L(AX) = LB 
(LA)X = LB 
X=LB 


So ifa solution X exists, it must be LB. Thus we have proved the next proposition. 


Proposition 2.6 Let 4, B be matrices and assume that A has a left inverse L. 
Then the equation 


AX =B 


has the solution X = LB or it has no solution at all. ow 


If'we are given L, we set X = LB and then check to see if this is a solution. 

The problem, of course, is finding L. In Chapter 3 we will develop a proce- 
dure to do this. Another procedure will be mentioned in Chapter 5. These proce- 
dures can be used to give X directly, so that often L is not explicitly calculated. 


22 Inverse Matrices 69 


The procedures will also give X when L does not exist. For the moment, however, 
we give a formula for L when A is 2 by 2 and a “canned” procedure for more 
general A. 
Propostmion 2.7 Let 
ab 
A 
[ea 


Then A is invertible if and only if A = ad — be 40, and if A 4 0, then 


a d 24 
ae. a 


Proof WS #0, then 
ie ze foke ieee lies 
Al-c alle d 0 ay Ka) 
by the rule for multiplying scalars and matrices. Thus A~! has the stated form by 


Proposition 2.5. 
Suppose, however, that A = 0. Then if a =b 


(° A 
00 


0 we have 


which is not invertible, since 


Now . 
‘A AE fe 
0 AJ” lo 
If A were invertible, it could be canceled, giving 


00 
Blo Al 


which is not true. 


Writing out matrices consisting of all zeros, as in the proof above, is tedious. 


70 Matrix Algebra 


Derinirion 2.7 A zero matrix is a matrix of any shape consisting entirely of 
zeros. Zero matrices will be denoted by 0. The shape can be deduced from the 
context. 


ExaMPre 2.16 Solve the linear system. 


x+ 
3x + 4y 


Solution Rewriting as a matrix equation gives 


[5 al]-b] 
(N-6 Cl-sol TIE]--2L)-(2] 


is the only possible solution, To check if it is a solution, substitute back into the 


original equation, 
[! le -1 +4 =] 
34 :| ee l= $s. 


Thus the system has the unique solutionx=—Iy=2  « 


The next example shows that the equation AX = B need not have a solution 
even if 4 has a left inverse. 


EXaMPLe 2.17 Let 


Moat 1 
Az=|-2 3 2 
-3 3 3 


and consider the system AX = B, By checking directly, we may verify that 
[° tea 

81 | 

is a left inverse for A. Thus, multiplying through by L gives 


veusals i a - [il 


22 Inverse Matrices 71 


‘Substituting in the original equation, we have 


alts] =|-2 a]lts] -[:]ee 
=s 3 15 


Thus the system has no solution. 


Proposition 27 gives an easily checked condition for a 2-by-2 matrix to be 
ertible (A # 0). Next we get a condition for a general matrix to have a left 
inverse. The full proof will have to wait until Chapter 3. First we state some useful 
identities. The proofs are left as exercises. 


Proposition 2.8 Let A. B. C, D be vectors or matrices such that AB, B + C, and 
CD are defined. Let a and f be scalars. Then 


1. A(B + C) = AB + AC 
2 (B+ C)D =BD+CD 
3. AaB) = aAB = ABa 

4 (a+PA=ol+BA 0 


Now consider a matrix A in which, say, the third column is the sum of the first 
column and twice the second column: 4{:3] = [1] + 24{:2} Then A cannot 
have a left inverse L To see this, suppose that L were a left inverse of A. Then 
LA = I. Now by the definition of matrix multiplication: (L4)|:3] = LA[;3}. So 


Tt LAN] 

= LAl:3] 

LAA] + 246-2) 

LAs} + L2A[:2}] by Proposition 2.8 Statement 1 
= LAf:1] + 2LA[;2] _by Proposition 2.8 Statement 2 


1:1) + 2/1:2) 


But for an identity matrix J, 


1:3] 4 1G +2. 


since 
1:3] = (0.0, 1.0,.-.) #(1,2,0.-..) = st) + E21 


In fact, no column of / is a linear combination of the other columns. It follows by 
an argument similar to the above that if A has a left inverse, then no column of 4 
can be a linear combination of the other columns. 


72 Matrix Algebra 


Derinrtion 2.8. A set of vectors is called linearly independent if no one of them is 
a linear combination of the others. 


The proof of the next result must be postponed until Chapter 3. 


Proposition 2,9 A matrix has a left inverse if and only if the columns of A are 
linearly independent, = 


We have been concentrating on left inverses and will continue to do so. State- 
ments about right inverses can be derived from corresponding statements about 
left inverses via the next proposition. For example, 4 matrix will have a right 
inverse if and only if the rows of A are linearly independent. 


Proposirion 2.10. The matrix R is a right inverse for the matrix A if and only if 
RP is a left inverse for AT 


Proof AR = Tif and only if 1 = 17 =(AR)’ = RTAT, ow 


The above definition of linear independence is somewhat clumsy to apply in 
most circumstances. The next proposition gives a more usable formulation. 


Proposition 2.11 Let 4 be a matrix and X an unknown vector. The columns of 
A are linearly independent if and only if the equation 


AX 


0) 


has only the solution X = 0. 
Proof The zeros in the statement of the proposition are, of course, zero vectors. 
Let vu, = Afji] and let x, = NX {i}. X #0 if x, 40 for some i 

First assume that AX = 0 has a solution X #0. Then by definition of ma- 
trix-vector multiplication: 


XV, XQe toe HU, oe +40, =O 


with, say, x; #0. Then v, is a linear combination of the other o's. In fa 


The argument is reversible. If 


Oy dy + OQ, + <=> $a; Wy + Oe 


ee 


then ¥ = (ay... 544 4, —1,4)44)--+44%) iS a nonzero solution, 


Inverse Matrices 73 


Domino 


A matrix A may have several left inverses if itis not square (see Exercise 4). One 
of these, the pseudo-inverse defined in Section 2.4, is denoted in APL by 8a. The 
symbol 8 (G-backspace--) is called domino. 

Thus if A is a matrix with linearly independent columns, then L-8a produces 
a left inverse L for At 


EXAMPLE 2.18 


38006 49-86 
21-80 491 

17 “245-97 Baa 

12-80 39 22 

45-59 29 786 

+LeBA 

0.003 0.034 0.003 -0.071 0.053 
0 003 0.030 0.001 0.061 0 032 
0 010 0.058 0.004 -0 093 0 059 
0.002 0.014 0.002 -0 027 0.015 


Lexa 
1000 1.03617 3.10E17 °3.17E-17 
9.B7E-18 1 00E0 1936-17 -1 65E-17 
1736-17 -1.39€-17 1 00E0 330-17 
3696-18 -3.00E-18 9 226-18 1 0060 & 


The example above illustrates a serious problem with machine computation, 
The calculations are not exact, In fact, since only a finite number of digits can be 
stored in a machine, a number such as 4 cannot even be stored exactly. 

As the numbers are stored in the machine, errors occur. As calculations are 
made and numbers are rounded off, more errors occur. In bad situations these 
errors can accumulate and render the computed answer meaningless. 

We will guard against this problem by always checking the answer, In the 
example above we computed L +.x A to see if L was an acceptable left inve 
The answer was not exactly an identity matrix but was close enough. The compu 
tation was done on a UNIVAC machine, which carries about eighteen decimal 
places. The deviations of the entries of L +.x A from / are in the last few plac 
This illustrates our working assumption for machine calculations. 
als are always wrong 


Working Assumption: The last few deci 

There‘is no absolute rule for accepting or rejecting the result of a machine 
calculation. The amount of accuracy needed for the particular application is all 
that is required. 


4 Provided A is not “ill-conditioned.” See Section 3.5 


74 Matrix Algebra 


Further, if the numbers in a problem are the result of laboratory or statistical 
‘measurements, then one cannot expect an answer more accurate than the original 


measurements. 
For the examples in this book, any number that is fifteen orders of magnitude 


less than the original data will usually be considered to be zero. 


EXAMPLE 2.19 
A 
3a0e13 60068 49085 8 60Et 
210613 5. 00€9 49065 1000 
1.70613 2.4510 9. 70E5 8 44e2 
120613 8 00E9 3.90e5 2.2061 
45013 5 90E9 2 905 8 60E1 
rhea 
3.55615 9. 4BE-14 3 17E-15 -7. S0E-14 5 SDE 16 
3.12E-11 9.04€-10 1 99E-11 -6 12E-10 9. 24-10 
1056-6 $8366 4 616-7 9 36E-6 5 996-6 
QNOES 1.4962 2.2263 2.7162 1. S1E°2 
Lexa 
1 000 428-21 3. SSE-26 4. 50E-29 
895-14 1.0060 5 296-22 -2 42€-25 
2216-9 7.21613 1 00€0 4876-21 
4.6566 1.04E-9 3.20614 1 0060 


We accept the new L because the size of the entries of L +. Ais less than 10°" 
times the original entries of A. w 


The factor 10°19 that we are using here is peculiar to the machine used to 
calculate the examples. Other factors must be used for other machines. The factor 
is called the comparison tolerance or fuzz. (To obtain the fuzz for a particular 


machine, type Ger. The symbol Der will be discussed in detail in Section 3.4 of 


Chapter 3.) 
EXAMPLE 2.20 Attempt to solve the system of equations 


—88x + 30y — 22 =1 
—4lx + 38y + 162 
69x + 90y + 35: 


Solution Let A be the matrix of coefficients 


And 3p-88 30 -2 -41 38 16 69 90 35 
A 


22 Inverse Matrices. 75 


88 30 -2 
“41 38 16 
69 90 35 


Let L be the pseudo-inverse. 


+L-8A 
0.001 -0.012 0.005 
0.025 -0 029 0 015 
0 064 0.101 -0.021 


Then the candidate for a solution 


sXelt ox 23 
0.00916 0.0114 0 0744 


Check the solution by looking at B — AX 


12 3-As ax 
1046-17 3 476-18 3.47E-18 


‘These numbers, called residuals, are negligible compared to the input data 
(for the machine used to compute the example), and so we accept as a solution; 
that is, to three significant figures, x = —.00916, y =.0114,2=.0744, ow 


ExaMPe 2.21 Attempt to solve the 


system of equations 


Sx, + 94xy — 22x 
—40x, — 88x, + 48x, =2 
= 194, — 8x, + 50Ky 
=S63x, — 1220x, + 14x 


Solution Let A be the matrix of coefficients, L the pseudo- (left) inverse, and X 
the possible solution defined by L. 


And 3p57 94 -22 40 -88 48 19 ~8 50 ~563 ~1220 114 


87 94-22 
40-8848 
19 8 50 

563-1220 114 

+uBA 


0.048 0.041 -0.020 0.000 
0.021 -0 017 0.010 -0 001 
9.007 0.018 0.007 -0 000 
tkelt xd 234 
0.0729 -0.0309 0 0632 


76 Matrix Algebra 


Checking the residuals: 


129 4-At xx 
1.14 "0.835 0.98 0,142 


These residuals are too large to be acceptable. We conclude that the system 
has no solution. 


EXAMPLE 2.22 Attempt to solve the system 


+2 432 
4x + Sy + 62 
Tx + 8) +92 


Solution Proceeding as in Examples 2.20 and 2.21, 


tAed 3009 


tea 
DOMAIN ERROR 

very 

A 


4 has no left inverse and our procedure breaks down: no conclusion. 


Notice that the la 
sisx=0,y=3,2 


{ system does have solutions: x 2 = Lisa solution, 
|, Systems such as this will be solved in Chapter 3. 


Dyadic @ 


The operator @ has a dyadic form. If A is a matrix and B is a matrix or vector, 
then x-ala directly produces a candidate for a solution to AX = B. This form 
of domino uses less machine time than x-(Ba)+ x8. 


EXAMPLE 2.23 Attempt to solve the linear system 


Bu —v + Sw 
12v — Tw =0 
3u — Sv +3w=1 


4 


2.2 Inverse Matrices 77 


Solution 


tAe3 3p8 “15012 -73 -5 9 


er os 
0. 12) a7 
a8. 33. 


+e 4 018A 
0.377 -0.927 “1.59 


40 1-Ae xx 
2.08E-17 -5 55E-17 2 95-17 


The residuals are sufficiently small. The solution is, to three significant figures, 
u=377,u = —927,w=-159. 


So far we have no general criterion for the equation AN = B to have a solu- 
tion. The following proposition is obvious from the definition of matrix-vector 


multiplication but is worth stating explicitly nonetheless. 

Prorosimon 2.12 Let A be a matrix and B a vector. The equation AX = 8 has 
solutions if and only if B is a linear combination of the columns of A. 
EXERCISES 2.2 


Lo 
| Sonap Operaererine 


2. (a) Show that £ 


Lo 
oO 
(b) Show that A does not have a right inverse. 


3. Let L,, L, both be left inverses for A. Check that al, + fi 
A whenever a +f = 1 


Lisa left inverse for 


Hint: Use Proposition 2.8. This exercise shows that if A has two distinct left inverses, 
then it has an infinity of left inverses. 


1-10-3 6 
—2 30 7 -I5}. 


18 -16 -2 -3 -7 
at=| 5 -4 -1 -1 


Sve yr 
(a) Show that L,, Ly are left inverses for A. 
(b) Find three more left inverses for A. 

Hint: See Exercise 3, 


78 Matrix Algebra 


5. (a) Show that if'A, B are invertible and AB is defined, then B-14~* is an inverse for 
AB. Hence AB is also invertible and (4B)-! = B-'A 
(b) Show by example that (4B)~! # 4-'B~ in general, 
6. Show that if 4 is invertible, then (A~)" is an inverse for AT and hence AT is also 
invertible with (47)! = (A~!)7, Further, the inverse of a symmetric matrix, when it exists, 
is symmetric. 


7. Show that if-4 is invertible, (A!) ? = A and (A")-! = (A“A® = AAs oe AL 
times), 


8. Show that if A, B, A +B are all invertible, (A™! + BOY"! = ACA + BYR = 
BUA + BY A. 


9. Show that if R, and Ry are right inverses for A and B= Ry ~ R. then AB 
Hint: See Proposition 2.8. 


30 Ot [ho OF 
10, Showthac}o 2 of =|0 4 oO 
00 =I 00 =-1 

) 


I Le D= ae 5 be a diagonal matrix (D{i; /| = 0 if / #/). 
0 dy 
rut - 1_ fait, 0 
(a) Check that if d, #0 for all /, then D =[ i ceil 


(>) Check that if d, = 0 for some /, this D is not invertible. 


Hint; Imitate Example 2.15. Or, by inspection, find an X 4 0 stich that AX = 0 
and invoke Propositions 2.9 and 2.11 


12, Solve the following linear systems by the method of Example 2.16. 


(@) x+y=1 (b) 2x—y=0 (©) 3e+5y=1 
x=y=l x+¢2y=t 2x Ty =0 
(a) 6x —24y =0 () x+y=3 


18x + 12y 


) x-ys3 


13. _In the problems below Z is a left inverse for 4. Check this fact and use it to solve for 
the unknown matrix X, if it exists, in the equation AX = B. 


IP-1 ¢ 4 
(a) A=]-2 3-4], Caat=|is 7 
2-1 5 83 


1-2 
(b) A=]-1 3 
1-4 
1 1 = hr 
04 
(c) A -6 
: Eas sa 
aa, oe 


22 Inverse Matrices 79 


1 
ea 
t 


() A 


14, Use Propositions 2.9 and 2,12 and the results of exercise 13(c) to show, without 
computation, that exercise 13(d) cannot have a solution. 


15. (Computer assignment) Use § to determine whether the n 
inverses. (IF not, you will get a DOMAIN. ERROR.) 


trices below have I 


-49 86 6 3316 
(a) 602 ri 0) | 2 =I8, “| 
sl -42 Saiiras7) =8); 
9 a 26-11 9 
es 1-26 (a) [-* 82 “| 
-% 1% 0-29 2 
88-260 
-40 6248 Sr mise! 4a) Fcishesiay 
=80 124 96 -13 -24 -96 -27 82 
9) |_200 310 240 tn) 1-65 40 7 -88 
=80 124 96 4649 32 -40 56 


80 -21 -12 -57 —35. 


16. (Computer assignment) Attempt to solve the given systems of linear equations using 
the dyadic function 8 


fa) 58x + 22y +7: (b) =472u + 454v — SOw = —22 
97x — 38) + 312 = —35 =97u + 88v + 16w =2 
=42x + 86y — SIz = 36 —69u + S4v + S4w = 12 
() 680x, + 271x, = —397 (a) —2x — 82y = 432 4 277 = —64 
—44x, — 96x, = —89 22x + Ty + 422 
—40x, + 38x, = 44 —74x — 9y — 4: + 59 
28x, — 19x, = 98 —S7x — S3y — S6z — 48 
(e) —38x, — 46x, — 93x, = -7 (f) 354, — 924, — 484, = —6 
23x, + 67x, + 47x = —33 =19t, — 2lty — 87ty = —38 
=95x, + 36x, = —68 4791, — 6061, — 2441, = 142 
13x, — 25x, + 423x, = 50 
42e, + 18x, — 33x, = 48 


80 Mairix Algebra 


ight inverses. 


17. (Computer assignment) Use fl to determine if the given matrices have 


Siete ale ne 
(by }-26 40-91 —46 


% 74 -92 22, 


nee 
o [oy Test 
9 


—9% 41 82 -97 —20 
@ | 32-82 30 105 296 
-66 62 =3. -2 -69. 


=80 -30 21 2 —S4 
—5 = =| -27 92 
27-38 -27 =69 00 
—49 89 53-88 —38 


(e) 


18, Verify statements | through 4 of Proposition 2.8, 


2.3. Matrix Algebra 


‘The identities of Propositions 2.3 and 2,8 allow us to ma 
in 4 manner similar to the way we manipulate scalar equations. There 
s, however, For matrices, AB # BA. This means that one cannot si 
an equation by a matrix, The equation must be multiplied either 
or “on the right.” 


ipulate matrix equations 
re differ- 
ply mul- 
“on the 


EXAMPLE 2.24 Assume that A is invertible and solve the matrix equations for X 


(a) AV=B (b) XA = (©) AXA =B @) AN=B 


Solutions 


(a) Multiply the equation on the left by A~! 


A MAN) =A 'B 
(A'AN = ATR 
XN *B, if X exists at all 


(b) Multiply the equation on rhe right by A”). 


(XA)A“! = BA 
X(AA+) = BA 
X 


BA, if X exists at all 


(c) First multiply the equation on the left by 4-1, then on the right by 4-1 


23° Matrix Algebra 81 


A-YAXA) = AB 
XA=A"B 
(XA)A = ABA 
N=A'BA, if X exists at all 


(d) Here A? = AA, Multiply on the left by (4-2, (See exercise 7 of Exer- 
cises 2.2.) 


(A>P4' (AB 


X=(A4/B, if X exists at all 


The answers to (a) and (b) are different and the answers to (c) and (d) are 
different. In fact, if 


in =a 2 
Az=z = 
alk i| sna 4 
then 
ey 
At 
Lid 
and the answers are 
r=! ‘] Wels 
ala a Wa (Ea al 
-2 0 6 8 
Gres «) ws >| . 


The statements “if X exists at all” are in fact unnecessary in the example 
above. Multiplying an equation (on the left or on the right) by an invertible 
matrix does not change the solutions. The only problem in verifying this state- 
ment is defining exactly what we mean by “equation” and “solution.” 

By transposing everything to one side, we can write an equation in the un- 
known matrix Xin the form /(X) = 0, where fis some function of the matrix ¥. 
Here “0” denotes a zero matrix of the appropriate shape. 

For example, the four equations of Example 2.24 can be written 


fa) fO (b) f(X) =XA—B 
() f(X) = AXA —B=0 (a) f(X) = AEX — 


By a solution of an equation f(X) =0 we mean any matrix § such that 
f(S) =0. Equations may have no solutions, one solution, several solutions, or an 
infinity of solutions, The familiar rules from elementary algebra about the num- 
ber of solutions of an equation do not usually hold for matrix equations. (See 
exercise 4.) 


82 Matrix Algebra 


Proposition 2.13 
1, IfA has left inverse, then multiplying an equation on the lefi by A does not 
change the set of solutions. 

2. If A has a right inverse, then multiplying an equation on the right by A does 
not change the set of solutions. 


Proof Let the equation be /(X) =0. If S is a solution, then /(S) = 0, hence 
Af(S) = AO = 0, and so Sis a solution of A/(X) = 0. (This shows that multiply- 
ing by any A never decreases the number of solutions.) 

Conversely, if S isa solution of A/(X) = 0, we have Af(S) = 0. IA has a left 
inverse L, then L(Af(S)) =0, hence (LA)/(S) = 0 or f(S) = 0. Thus S is a 
solution of ((X) =0. = 


Since the matrix A~! of Example 2.24 has a left inverse (namely A), the 
equation AX = B has the same solutions as the equation X = 4-18 

IfA does not have a left inverse, however, left multiplication by A may intro- 
duce extra solutions, 


EXampte 2.25 


= 1 ! 81 
A=| 3 -4|. B=!2] L=b 0 
Sie) 3 


Lis a left inverse for A, Solve the equation AX = B 
Solution Multiplying on the left by £, we have 


AX =B 
LAX = LB 


ees 
X=LB8=|13 0 1 


| I 
2, =L 6. 
Checking, we have 
-! 1 =10 
(eles f-[2] 


Thus an extra solution has been introduced, The original equation has no 
solution. = 


In the example above L itself does not have a left inverse. although A is a 


23° Matrix Algebra 83 


right inverse for L. (In fact if L had a left inverse also, then L would be invertible 
and hence, by Proposition 2.5, square.) 
Additive terms in the equations pose no problem 


EXAMPLE 2.26 


3 o 1 -1 an 
-3 -1|, 4 0 of, B=|-1 of, 
2 0 -2 3 0 -2 
| 3, —2 -3 

=) 

1 0 

1-4 

2-2 


Using the fact that LA = J, solve AX + B 


Solution \f X exists, then 


AN=C-B 
LAX = L(C ~ B) 
¥=L(C-B) 


and so the only possible solution is 


2 i) Wt Oe 4 a 
Kea) a3: 3-2 —i]{| 1 Of/—|=1 0 
2 2 1 0 1-4 o -2 
2 =a) |-2 = 
3.1 4 o7f-2 Oo] f-3 -2 
=/-3 -3 -2 -I 2 0 | r) 
De eae dp Oy) ae 1-2 

4 1 


If V is a solution, then, substituting into the original equation, we have 


1 a =1]/-3 -2 1 oa) f- 
0 1 Ofl-6 3)/+{-1 o}= 
ao =a) SL) 2 o - 

-2 


t a 
iL ty 
1-4 


84 Matrix Algebra 


210, vo) je 
SA ee Gai a ie 
43 S14] "|| 9) =2 (nif) es 
iz) ale 6 2) 8 

ai Say) (et 

Py & 10 

33 =16]~ | 1 =4 

4 4 z 2 


which is false, Thus the equation has no solution. 


EXAMPLE 2.27. Matrices L,A are as in Example 2.26. 


Cen oe 22 leas 2 
es Supe e=[5 i =) 


Solve the equation XL + B 


Solution Ais a right inverse for L. 
XL=C-B 
NLA =(C —B)A 
C— BA 


The only possible solution is 


r= (2 a2 zee 1) | 
401 -2 200 -1 Oo ths ano 
—2 -4 3 
rob =3 


lhl 
SY 10: 


If-X is a solution, then substituting into the original equation we have 


ae eae CARE eye yeky Tel cy 
310 ae el eryy eller ny 2 


23° Matrix Algebra 88 


Ie —2 1 -4 +[ Beale HE 22012 -2 
ON et 200 -1)> L401 -2 
lie fee = 2212 -2 
401 -217-l401 -2 


Thus X is a solution in this case. 


In special cases one may left-multiply by a matrix that does not have a left 
inverse without introducing extra solutions. The next proposition provides an 
example. Notice, however, that the proposition is no longer true if the equation 


AX = is replaced by AX = B. The proposition provides an explicit formula for 
a left inverse. 


Proposition 2.14 Let A be a matrix, X an unknown yeetor, The equation 
AX =0 
has precisely the same solutions as the equation 
ATAX = 0. 


In particular, A has a left inverse if and only if AT is invertible, In this case 
(ATAYM4? is a left inverse for A. 


Proof We have to show that multiplying by AT has not introduced any extra 
solutions. Let S be a solution of ATAX =0. Then AvAS =0 and hence 
STATAS = 0. 

But for any matrix ¥, V7V = 0 if and only if V = 0 (exercise 11 of E) 
2.1), because 


cises. 


(PVE 
+ VED 
» Vi) 
= SV iF 
n 
=p 


which means that V{f; i] = 0 for all f and all i 

‘Thus the two equations have the same solutions. By Propositions 2.9 and 2,11 
A has a left inverse if and only if ATA has a left inves nce ATA is always 
square, it has a left inverse if and only if it is invertible (Proposition 

‘The equation (ATA) "ATA =I shows that (ATA) "A? is a left inverse 
ford. 0 


86 Matrix Algebra 


Derinrion 2.9 If'd has a left inverse. then the particular left inverse (A7™A)-*AT 
is called the pseudo-inverse and will be written BA. The product (474) 147B will 
be written BEA. 

This is the left inverse computed by domino and used in Section 2.2. The 
“division” “B8A" has A on the wrong side —but it is the form the machine 
recognizes. The left inverse FA has several nice properties that single it out from 
the other left inverses of a matrix 4. The most important property. which makes 
GA quite useful in statistical and curve-fitting applications, is given in the next 
proposition. A calculus-based proof will be found in Section 2.5* and a linear 
algebra proof is given in Chapter 5. 

Assume that A has a left inverse. By Proposition 2.6, if AX = B has solutions, 
then it has only the solution X = LB, where L is any left inverse of A. This means 
that as Z varies through left inverses, the product LB remains constant. If AX = B 
has no solutions, however, then LB is free to vary with L and will do so. 

Recall that, for a vector , V +. +2 isthe sum of the squares of the components 
of ¥. 

It follows that 0 = + »2 only when ¥ 


0. 


Proposition 2.15 Let A be a matrix with linearly independent columns. Let B 
be a vector with (oA)[1] = 9B. Let ((X¥) =(AX — B)+ +2. The unique value 
of X that minimizes ((X) is X = BRA. 


Notice that /(X) > 0 und /(X) = 0 only if X is a solution of AX = B. 


Detinriion 2.10 By a least-squares solution of AX = Bis meant any S for which 
{(S)is a minimum, I-A has linearly independent columns, then there ts only one 
least-squares solution and it is BRA or (ATA) SA™B. 


~ Least-Squares Lines 


Although a proof of Proposition 2.15 must wait until later, it is not hard to see that 
it gives the usual formulas for the least-squares straight line [Equations (1.4)} 

Given the data points (x4.¥4), (X09): «+ + (ys Je We Wish to find the 
straight line» =a + bx that minimizes the quantity 


Sidi = Ny - a — bx 


Now if the points all lie on a line, then the 
the system of linear equations 


flicients a, b are the solution of 


a+ bx, =y, 1 


or 


23° Matrix Algebra 87 


where B =(),.....J,) and A =(x,,-..,x,)°.0 1. The function being mini- 
mized is 


MX) = 3, — a = bx? 
& 


the sum of the deviations squared. 
Multiplying through by 4”, we obtain 


riealee a 


ataxia [." 


z= 


ANB 


Solving the latter equation, 


NX = (ATA) ATB 


or 


il | 
b 
These are the usual formulas. 
In calculating least-squares straight lines by hand, one simply sets up the 


equation AX = B, multiplies through on the left by 47, and applies the formula 
for the inverse of a 2-by-2 matrix (Proposition 2.7). 


Exampce 2.28 Find the least-squares straight line fitting the data points (2, 1), 
(3,3), (7,3), (9.7), and (8, 5), Sketch the resulting straight line with the data 


points. 


Solution ‘The matrix equation is 


LZ 
13 
Wg 
19 
18 


88 Matrix Algebra 


or 


or 


or 


2 
3 
7 
9 
8 


l=L 379 al 


When using a machine, of course, one uses domino, 


EXAMPLE 2.29 Compute the coefficients of the line 


xed 
Yor 


+he 


YBa 


7 
3 


koe 


000 
000 
000 
000 
000 


639 


8 
5 
1 


207 
29 


of Example 2.28 in APL. 


23° Matrix Algebra 89 


More generally, the least-squares polynomial of degree k is the function of the 
form p(x) = dy + a,x + agx? + -+- + a,x* that minimizes 


SD = SOW = aq = 44x, — 1+ = a,x, k)* 
1 


If k <n, then p(x) is uniquely determined. The least-squares straight line is the 
case k = 1. 


PROPOSITION 2.16 Let X = (24,0. .4.%q)s)) = (y+ Jy) be vectors, The coeffi- 
cients of the least-squares polynomial of degree k for the data points 
(xpJiyde << +(Xgy Ja) are the components of the Vector 


YEX+ sock 8 


EXAMPLE 2.30 Fit a least-squares cubic to the data of Example 2.28. Sketch the 
result. 


x 
23798 

y 
1aa75 

ACOEFF-YEX" +01 23 
#39 7.22 1.43 0.0907 


90 Matrix Algebra 


To sketch the result we find the value of the polynomial at the points x = Ot 2 
3, 4. 5, 6, 7, 8, 9. Notice that this calculation is easily accomplished via matrix 
multiplication 


(01234567 8 9% +01 2 3)+.*COEFF 
8.39 -2.51 1.07 2.88 9.47 3.38 3.17 3.96 452 7.17 


(To facilitate such computations the indeterminate form 0° is taken to be | in 
APL.) 
The graph is 


Multiple Regression 


The technique above applies to much more than polynomial fits. In applications it 
often happens that more than one variable is being measured. For example, a 
variable w may be thought to depend on variables x, y. = via a relation of the form 
WSC, +0, + cay + cz. The data points are then vectors (x4, )4. 2). 
(Xp Dn Zqy Wa) Hf one sets up the system 


a]fea] fm 
| 
4i}|ea] =| 9] = 8 
E}hee z 


which has a solution when the equations w 


Cy tek, ten te 


; are satisfied 


23° Mairix Algebra 91 


exactly, then the expression BEA gives the vector of coefficients cy, ¢» cy ¢, that 
minimize the expression 


J) = SH, = 0) — C9) — Cab = GE 
7 
These c;'s are called the multiple-regression coefficients. 


EXxaMPLe 2.31 The following table is derived from the data following exercise 61 
of Exercises 1.2. 


TABLE 21 
Mites | Days | Gallons 
203 6 146 
202 2 13.0 
199 2 4 
236 4 149 
175 8 13.8 
160 2 145 
146 7 136 


The columns give the number of miles, days, and gallons used between fill-ups. 
Find the multiple-regression equation for distance traveled as a function of time 
and fuel. 


Solution 
DAYS. 

62248127 
GALLONS 

14.6 13 13.4 1409 139.8 145 13.6 
MILES 


203 202 199 236 175 160 146 


+Ac1, +2 7»DAYS GALLONS: 


1.0000 6.0000 14 6000 
4.0000 2.0000 13 0000 
10000 2.0000 13 4000 
1.0000 4.0000 14 9000 
1.0000 8 0000 13.8000 
4.0000 12.0000 14 5000 
1,000 7.0000 13 6000 


+CMILESHA 
203.6 8.396 31.6 


ix Al 
92 Matrix Algebra 23° Matrix Algebra 93 
Thus the regression equation is (b) L has right inyerses R and V. 

(©) U and A are inverses of each other. 

(8) Nand J are inverses of each other. 

The sum of the square deviations and the deviations themselves are (©) Q isa left inverse for T: 


miles = —203.6 — 8.396 (days) + 31.6 (gallons) 


(MILES-A+ xC)+ 2 2. Using the results of exercise 1, solve for X and check the solution. 


704 6 (a) AN4B (b) XA4B (XL 
id) XL= Ww (fe) RYL=1 (fy RNL 0 
Pa exemaecl (g) DX + EX =F (h) XG +H=NK (i) PXO +X 


4.579 11.6 ~4.04 2.348 9.694 6.156 -2138 0 
Hint: Use Proposition 2.8. 


3. (a) Show that if £,, Ly are two left inverses of a matrix A and Z = B( 

EXERCISES 2.3 where B is any matrix for which the product is defined, then 24 = 0. 
(b) Find a matrix Z 4 0 such that ZR = 0. 

(ec) Find a square matrix Z such that ZR 


Use the following matrices for exercises | and 2. 


gt oo 010 A / 
a) Find a Z (1) such that ZR = R. 
[a ; -'} : , 4 c-{! A | (4). Find a matrix Z (41) such tha 
S20 bs 0 -2 010 4. Show that the “quadratic equation” N® = / has an infinite number of solutions when 
10 0 Oo 2 X and are 2 by 2. 
oa 9 = Fs on of AX = B, Is the Jeastaquares solution a solution? 
pels a} rel, caf? =) $: Find the esscequaesslulion of A¥ =D; ts he lesiCaquares solution a solion 
00 =e “3 tM) z ers 
10. er :  4=fo o]. #=fo (b) -2 1}, B=|-1 
ae reer nae O41 3 a2 4 
wep a Jel i} e=[) x 0 -il Ho Vy op 
= f t 
ee ne =2 A= |. B () A= & 
ia [ieee salt : al 0 io 0 
Hh yt 0 0 0. \ 
Taree: =2 
ie ay 1 oo 0 1 
cee te) 1-1 Of. B=] 3 
en Eh a Baty =|-2 
@=" -1 -1 -1 = |e Cadiihme'll 0 
1 ai 1-1 0. 3 
Teen ea (See exercise 11 of Exercises 22.) 
SO oes Lt OIL re =y Siorcahs) sy 6. (Computer assignment) Use B to find a least-squares solution of AN = B. Is 
7 ts%) Soa 2h 10 the least-squares solution a solution? 
0,2 si 68 “4 
-1"=1 “ x =4 -19 -7) 4 _| 105 
6 lf : ; i : trade Wiel aga eg lars 
3 =3 : = i 242-814 -12, 1858. 
oe =82 -36 40 ~ 66 
1. Verify the following statements, =|“ 97 -2 9% 
OP a eleva aa fie) 


(a) R has left inverses Land M. 
4 99RD 52 


94 Matrix: Algebra 


40 18 =5 
14-2) -69 
() 4=]-408 -61]. B=] 383 
B2eess. 97 
226-205. 324, 
268 386-213 —1074 — 633 
92-27 -S8 =27 —88 
(@) a=] 53-88 38-91], B=]-194 
6 60 49 89 -9 
=—5 -82 -I 27 —103 
97 -2%6 83 60 
=27 86 «90 «50 19 
25) 1842 225 —387 —1099 
(e) A=] -22 16 18 2), B 0 
-326 -118 559157 46 
-24 61 -45 82 19 
-12 0 92 6 —39. 


7, (Computer assignment) Do exercise 60 in Exercises 1.2, using 8 

8. (Computer assignment) Do exercise 68 in Exercises 1,2, using 8. 

9, (Computer assignment) In Example 2.31, the sum of deviations squared for miles as 
& linear function of days and gallons was about 765. What is the sum of the deviation if 
‘one writes miles ts a function of gallons per day used between fill-ups? 

10, (Computer assignment) Consider the data of Table 1.3 in Exercises 1.3. Find the 
regression equation of weight gained as a linear function of height at the beginning of the 
experiment, weight at the beginning of the experiment, and fitness per kilogram body 
Weight [see exercise 31(b) in Exercises 1.3}, Do this for the control group and the experi- 
mental group separately, Find the sum of the square deviations in each case, 


11. (Computer assignment) The table is, approximately, the sine curve on the interval 
ero to Ir. Sketch the curve along with the least-squares line, parabola, and cubic approxi- 
mations, 


x | 346377 408447) 503534565 597628 
AB SI" Ot GS ng ass aaa) ara 
12, (Computer assignment) From the table estimate the production of steel in 1960 by 


fitting a least-squares straight line wo the data and using the value given by the line for 
1960, 


Year | 1946 1947 1948 194919501951 19521953 19841985 1956 


Tons ofsted | 666 S49 86 TKO M68 1052 932 116 883 ITO 1182 


24 Affine Functions, Quadratic Forms 95 


13. (Computer assignment) The following table gives age, height, and weight for thirteen. 
children, 


Age SE NO DOM eee 1G) Oey 


Height | 5652-49 Ol SL 56 SS 


Weight | 64 56 556 S459 HST OTS 


(a) Find the regression equation for age as a function of height and weight. 

() Find the regression equation for height as a function of age and weight, 

(©) Find the regression equation for weight as a function of age and height. 
14, (Computer assignment) Least-squares fit of y= ae: If we find that the points 
(4.9) (Xa, Jade «lie on the curve » = ae", then y, = aes, / = 1,2... We can convert 
this (0 a linear system by taking the natural logarithm of both sides of the equations: 
Iny, = Ina + by, Thus we may estimate In a and b by fitting a straight line to data points 
(x; Iny,). Fit an exponential curve y = ae” to the data 


i (OE ey Vi 


ylus 6 3 28 2 


Plot the data points and the curve on the same graph. 
15, (Computer assignment) Least-squares fit of y = ax*: If the point (x,,»,) lies on the 
curve y = ax! then {, Taking logarithms as in exercise 14, we have In 
y, = @ + bin x,. Thus we may estimate b and Inq by fitting a straight tine to the data 
points (In xj, In y,). Fit a curve of the form y = ax" to the data of exercise 14. Plot the 
curve you obtain on the same graph as the plot of exercise 14. 


2.4 Affine Functions, Quadratic Forms 


‘The material of this section is needed for the optional sections of this chapter. The 
material is not needed for Chapter 3. If the optional sections are not to be covered, 
then the material may be postponed until Chapter 4. 

We will use the symbol R” to denote the set of vectors v with n = pv. We will 
denote the set of scalars (real numbers) by R- 


Derinition 2.11 A function f: R" > R™ is a rule that assigns a vector in R" to 
each of a set of vectors in R". The set of vectors in R" to which the rule applies is 
called the domain of f- The collection of vectors in R® that results from applying / 
to the domain is called the image. 


Exampte 2.32 Define (R°—R® by f(v)=v{1 35} For example, 
fil. 7.9.6, —3)) = (1.9, 3). The domain of fis all of R®, since the rule applies 
to all v of length 5. The range is all of R®, since any vector in R can result. This 


9% Matrix Algebra 


result can happen in many ways —for example, (x,y,z) = U(x 1». 2,2) = 


A(x, 0,y,0,2)), and soon. = 


ExampLe 233 f/R'— R* by /(v) =0+5. For example, /((1,4,9, 16) 
(1, 2.3, 4), But a vector such as (1, 2, —3, 7) is not in the domain of f since 
is not a real number. 


12-3785 
DOMAIN ERROR 
12-3765 


The image of fis any vector in R* with nonnegative components, since for 
such vectors (a, b, ¢,d) = f(a, 6, ¢, d)+2) = (Va?, VX, Ve, Vd?) 


Notation (vis convenient to eliminate the inner set of parentheses when writing 
functions f: R" —» R®. We shall write, for example, f(1, 4, 9, 16) for f((1. 4, 9. 16)). 


ExamPte 2.34 Given an equation AX = B with A a matrix and X, B vectors, a 
least-squares “solution” was defined in Section 2.3 as a vector ¥ that minimized 
the scalar 


[KX) =(B =A + x NX) +.92 


If A is m by n, then X isin R" and FR" —>R, ow 


EXAMPLE 2.35. Suppose that one has the problem of solving a set of simultaneous 
nonlinear equations — for example, 


x? py? + 1 


lx 


x sin y — ysin > = tanx 

In Chapter 3 a theory of simultaneous /inear equations is developed, but no 
such theory exists for nonlinear equations, In general, the best one can do is try 
numerical methods. Most canned programs for solving simultaneous nonlinear 
equations require that the problem first be transformed into a standard form. This 
form is 4 


AX FR" > Re 


In the case above, for example, one possible fis given by 


2 


xy 


=(I-e= 


— 2 xyz +x — Itanx —xsiny+ysinz).  s 


24° Affine Functions, Quadratic Forms 97 


EXAMPLE 2.36 A standard system of differential equations from mathematical 
ecology is 

dQ 

Gn - aro 


ap 


dt 


—sP + bPQ 


Here P is a population of predators (“wolves”) and Q is a population of prey 
(“rabbits”). The term PQ is assumed to be proportional to the rate at which the 
predators encounter the prey, 

Thus the first equation says that the rate of increase in the rabbit population 
depends on two factors. It increases with the number of rabbits but decreases with 
the number of wolf-rabbit encounters. 

To analyze such systems of differential equations either theoretically or nu- 
merically, we put them into a standard form: 


Cheer 
aye , -RY > RY 
a =f, FR 
Where the vector ¥ is differentiated componentwise, In the equation above we 
have 

eh (ee <<) 


eae: w= a 


and f: R? —» R® is given by (P,Q) = (rO — aPQ, —sP + bPQ). 


Exampre 2.37. Curves in the plane are often written parametrically. For example, 


represents a line in the plane, for if we eliminate the parameter 1 we find 


y=1+32-%) 
or 


y 3x +7 


The parametric form of the line can be considered a function f:R > R# (is a 
scalar) given by the formula 


fi) =2—41430 


Here the domain of fis all scalars and the image of fis the line y = —3x +7 
The point /(1) sweeps out this line as 1 sweeps from —se to 20 (Figure 2.1). = 


98 Matrix Algebra 


FIGURE 2.1 


Linear algebra is concerned with those functions /: R" + R” which can be 
handled with matrix algebra. 


Derintrion 2.12 An affine (or nonhomogeneous) linear function f. R® —» R™ 
function of the form 


y 


(X) = B+ AX 


where 4 is an m-by-n matrix and B is a vector in R". 


Derinrrion 2.13 An affine linear function Y = B + AX is called simply linear if 
B =0. Thus a linear function f: R” — R™ is a function of the form 


Y=AX 


I f(X) = B + AN, we will call B the constant term and AX the linear term. 


Note: We often abbreviate the term “affine linear function” to “affine func- 
tion, 


EXAMPLE 2,38 The function of Example 2.37 can be taken to be affine if we take 
the parameter 1 to be a vector with one component instead of a scalar. Then, 
writing vectors as column vectors, 


+f ]eno=B) a 


24 Affine Functions, Quadratic Forms 99 


that is, 
xX=(0), a=[3]}: B=(2,1), and Y=(xy). 


For the next example we find it useful to confuse R (scalars) with R! (single- 
‘component vectors). 


ExaMPLe 2.39 A real-valued affine function of a real variable //R > R is a 
function of the form y = f(x) = ax + 6. Thus the affine functions R — R are 
those functions whose graph is a straight line. This graph intersects the y axis at 
the point (0, 6), Thus a function / R — R is linear if and only if the graph of f 
is a straight line through the origin, 


Examples 2.38 and 2.39 give two ways of using affine functions to represent a 
line in the plane. In Example 2.38 the line is the image of an affine function 
f/R > R* In Example 2.39 the line is the graph of a function. In Chapter 4 the 
method of Example 2.38 will be extended to higher-dimensional spaces 

Ifa function f: R" —+ RM is expressed in formulas involving the components 
of X = (x;,%p,...,%,), then fis affine if and only if these formulas involve only 
constant terms and terms of the form ax,, where a is a constant, Terms of the form 
ax, are called first-degree terms, constants are zero-degree terms, In the formula 
J(X) = B + AX the constant terms make up the vector B and the terms of the 
forms ax, come from the linear part AX. If there are no constant terms (B = 0), 
then the function is linear 


Exampte 2.40 The function /: R° — R* of Example 2.32 is affine. In fact it is 
linear. The definition is /(X) = X[1 3 5]. We can write this as 


My 
ts % 
N| sal] = [0 
% % 
Xs, 
The only terms that appear are the first-degree terms x,, x5, and x,, and so the 


function is linear. = 


In Example 2.40 the function is linear but the matrix 4 is not obvious. The 
next proposition gives a method of computing A as well as a simple characteriza- 
tion of linear functions by two identities. 


100 Matrix Algebra 


Proposition 2.17 Let i R" > R™. 
1, If fis linear, then the identities 


HX, + Xa) =) +f) 
MaX) = afiX) 
hold for all vectors X, X, Xy in R® and all scalars a. 


2. Conversely, if the two identities of statement I hold for all X, X,,X in R* and 
all scalars a, then fis linear and 


Y= f(x) =AX 


where the matrix A is defined by Alsi] = /U[ji). Here 7 is the n-by-n identity 
matrix, 


Proof, Statement | is just the definition of linear function plus Proposition 2.8, 
To check statement 2, notice that if the identities hold, then 


fle Xy + agXp + +++ + 0yX,) = 0, fU%y) + agf(%e) + = +0, f(%) 
for any collection of vectors X, and scalars a, (see Exercise 34), Thus, if 


X = (Xp Nyy. 4¥y) We apply the definition of matrix-vector multiplication to 
obtain 


AX 


SUX) = f(x HN) + Xgl] + + x,t bn) 
=X, fs 1) + xo /UE2) + +x, fbn) 
x,AGN] + XoALs2] + 9 + xy Alsn] 
=AX . 


EXxapLt 2.41 Returning to Example 2.40, we see that 


0 


1 0 
ssn] sven [o} S/UE3) =] 1]. 
0. 0 0. 


0 0 
satay =fo}. subsp =}o 
0. 1 


Thus, 


1000 
f(X=|0 0 10 
0000 


-es5 
D 
Paar 


24 Affine Functions, Quadratic Forms 101 


The point of Proposition 2.17 that we would like to emphasize is that a linear 
function /: R" > R is completely known when its action on the n vectors 
I(:1,.-.. An) is known. In particular, the matrix A is unique, A general affine 
function is determined by its action on n + 1 vectors — the columns of / plus the 
zero vector. To see this, notice that the constant term of f(X) = B + AX is simply 
‘f(0), and so computing /(0) gives us B. To get A we apply Proposition 2.17 to the 
linear function 


g(X) =f(X) — f(0) = AX 


Exampte 242. Is the function f° R° — R? given by 
f ‘|| ap ae 
|| b224+7x-4 

affine? If so, find 4 and B. 


Solution Only constant and first-degree terms appear, and so it is affine. 


pe -1y 


0 
B=/(0) -/(h) = Al and AX = f(x) ~f0) = 80) =] 5° 


is the linear part 


fe 


and so 


and 
a) pee ake Oye 
noy=[_4]+[5 hal o 


Statements about affine functions can often be verified with a little matrix alg 
bra. 


fine, then the 


EXAMPLE 2.43 Show that if (R" —» R" and g: R" —> R? are 
composite function g » f: R" —» R? is also affine. 


Solution Recall that the composite function g e fis defined by g ° f(X) = gi /(X)) 
Suppose that f(x) =B + AX, g(¥)=D +4 CY. Then go f(x) = g( fla) = 


102 Matrix: Algebra 


D+ Cf(X) =D + C(B + AX) =(D + CB) +(CA)X. And thus g 2 /(X) = 
F + EX, where F=D+CB is a vector in R° and E=CA is an m-by-p 
matrix. = 


QUADRATIC FUNCTIONS 


The affine functions /:R" > R are those functions containing only terms of 
degree zero (constants) and one (ax,). If we replace R™ by R, then we may include 
second-degree terms, by which we mean terms of the form ax,.x,, as well. 

In order to write second-degree equations in matrix terms we use the identity 
given in the next proposition. The proof of Proposition 2.18 is a variant of the 
calculation used in the proof of Proposition 2.3 and is left to the reader. Notice 
that if A is an n-by-n matrix and n = »X, then xXand Xe nde «Nisa 
scalar 


Proposition 2.18 Let A be an n-by-n matrix and let V = (x,......¥,) Then 


SDaAle ss, 


(Xr xde 2X 


‘To write the expression Xs «A+ «X in matrix multiplication (rather than 
atrix-vector multiplication) terms, we take X to be a column vector. Then AX is 
also n by | and XTAX is the I-by-1 matrix with entry equal to the scalar 

Xe wAe x 
To see just what the double sum of Proposition 2.18 looks like, consider the 


2-by-2 case, 
BAe lees 


= ax} + bxyxy + exgx, + dad 


+ this can be shortened to 
[IE allt]-er+e as 
xt Le alley] = Ot +O + tate + eg 


In this form it is clear that different matrices A give the same result X7AX as 
long as the sum All; 2] + 4[2: 1] remains the same. For example, 


le 2 (! 4) [! 0 1 104 
23" lo 3h is 3 epee al 
all give the same result: XTAX = NE 4 4xyxy + 3x3 


We take advantage of this wide choice of 4’s to choose A symmetric. We do 
this by choosing 4[1; 2] = A[2; 1}, 


24 Affine Functions, Quadratic Forms 103 


ileal = axd + 2bx,xy + ox} 
X% ated 


Proposition 2.19 Let A ben by n, 8 = (A + AT), and let X be a column vector. 
Then B is symmetric and 


XTAX = X™BX 


Bis the unique symmetric matrix for which the equation above holds for all, 


Proof The symmetry of B is left as exercise 35, the uniqueness as exerese 36. 
First notice that X7AX is certainly symmetric because it is a I-by-1 matr 
Thus ¥ 


XTAX = (XTAX) 
NTAM(XT)E by Proposition 2.2 
TATX 


Then 
XTBX = XTHA + ATX 
ANNAN + ATX) by Proposition 2.8 
(XTAN + NTATX) 
UNTAX + NTAN) 
‘TAX. . 


It is the fact that second-degree terms can be handled by symmetric matrices that 
makes symmetric matrices important. 


Derinrion 2.14 The function /: R" —> R is an affine quadratic function if it can 
be written in the form 


J(X) = 0 + BX + NTAN 


where cis a scalar, B is a matrix, and 4 isa symmetric matrix (the vector X is here 
confused with a single-column matrix). The constant (or zero-degree) term is c, the 
linear (or first-degree) term is BX, and the quadratic (or second-degree) term is 
XTAX, 

If the constant and first-degree terms are missing (c = 0, B = 0), then fis 
called a quadratic form. 

Notice that matrix B has but a single row in the definition above. 


Exampte 2.44. Consider the function of Example 2.34 
J(X) = (B= AX) +22 


Here B and X are vectors; 4 isa matrix, The function is clearly a sum of squares of 


104 Matrix Algebra 


linear and constant terms, and so it should be a quadratic function. To verify this 
fact take the vectors involved to be single-column matrices and use the identity 
(V 4.82) = V +.x Then 


[(X) = (B — AX)"(B = AX) 
= (B — AX)'B —(B — AX)TAX 
= (BT — XTAT)B — (BT — XTAT)AN 
= BTB — XTATB — BTAX + XTATAX 
‘The term BTB is the constant term; the quadratic term is X™(ATA)X. Now XTATR 


is a I-by-1 matrix, hence it is symmetric. Thus X7ATB =(XTATBYT = 
BIAT)T(XT)" = BTAX, So that 


S(X) = BTB — (2BTA)X + X™ATA)X 
aan affine quadratic function, 


ExampLe 2.45. Is the function /: R" + R defined by 


Mx y.2) = x? = Dey + Az 
a quadratic function? If so, write the function in matrix form. 


Solution Only constant linear and quad: 
quadratic. The constant term is ¢ = —1 
(1,0,0) = 0, @(0, 1,0) = 0, g(0,0, 1) = 4, so B =[0 0 4] by Proposition 2.17. 
The quadratic part is x? — 2xy. Using Proposition 2.18 with x =x, » 
y» We see that the quadratic part is A{l; Ix? + —(4[l; 2] + A[2; px 
1, Aft; 2] = Al2: 1] = = 1. and 


i¢ terms appear, and so the function is 
The linear part is g(x Now 


1 st 8 
A=|-1 00 
0 0 0. 
‘That is, 
f(X)=-14{0 0 Alf] etx —1 O}fx] 
y 0 Olly 
z CO | 


EXERCISES 2.4 


In exercises | through 28 decide if the function is linear, affine, quadratic, a quadratic 
form, or otherwise. In the first four cases write the matrix form of the equation. 


» AG) = E533] > s(()=E—) 


24° Affine Functions, Quadratic Forms 108 


x 14+ 4x— Iv dx — By 4 42 
3 = s i 
ACD = brs 55] a eee 
S. flsy,2) = 14 2x 43y 442 6 f(s), 2.0) = 292 
7. flay. 2.0) = xy + 20 8 (e+ F-34298 
9. = +2 + 3G tx — 
y V+x 
10. i f{]y )- V+) 
. lee 
12. 13, f(X) = XX in R* 
14. 15. f(X) is X+2, Nin Re 


16. f(X) is (X+2)f1), X in Re 

17, fiX)is XL 3S} +N 4 6) for Xin R® 
18, JX) is.8 2 3px for Xin RY 

19. JX) is (14) = 0x for Xin RY 

2. JON) is. (04). -x for X in RO 

21. UX) is. (4) * ox for Nin RY 

22, UX) is xe xx for Xin RY 

23, f(X) is Vex) pete xx for Xin R® 

24. f(X) = (0,0) for Xin R® 


25. / RE = R? by fix.) =" AR “Al 

26. fe RY + RM is xe om, Man n-by-m matrix 

27. PRY RY is (xe uM) ONE x, Mand N are n-by-n matrices 
28, Sketch the images of the following affine functions // R + RE 


si 
@ fo=[1 531] w po=[_'] © fo=[t ti] 


w ro=[] © $= [7573 


29. Show that if RX > R™ and g:R" — RY are linear, then gf: RY — RP is linear 
30, Let f; gs RY + Re and show that 
(a) If fg are affine, then f= g is affine 
(b) IP f g are linear, then f+ g is linear. 
31. Show that if f; R* —+ Re is affine and if g: R" —+ R is quadratic, then g of R" > R 
is quadratic. 
32. Show that iff, g: RX — R are affine, then the function A(X) = /(X)g(X) is quad- 
ratic, Write A(X) in matrix form. 
33, Let RY > R™ by Y= /\X) = B + AN 
{a} Assuming that L is a left inverse of A, solve for X in terms of ¥ 
(b) Show that, if is. left inverse of 4, then the alfine function g()) = LY ~ LB 
is a left inverse of fin the sense that g  f(X) = 


106 Matrix Algebra 


(c) Show that if there is an affine function g(1') = D + CY such that g 2 /(X) =X 
for all X, then A has a left inverse. 

(a) Show thatif there is an affine function g(¥) = D + CY such that g ° f(X) = ¥ 
and fo g(Y) = ¥ for all Vand ¥, then A is invertible. 


(e) Show that if 4 is invertible, then there is an affine function g such that 
ge fix) =x and fo g(¥) = ¥ forall X and ¥. 
34. Let fi R" + R™ satisfy the identities of Proposition 2.17, statement 1. Show that 


ete HayXy) = ay MLK) + ag fNy) + o> + 0, /%). 


Alay, + 04%, 


Hint Let A = ay X, and apply statement 1 to f(A + B) 
35. (a) Show that if A is a matrix and a is a scalar, then (aA)™ = aT. 

(b) Show that if A is a square matrix, then (A + A?) is symmetric. 
36, Let A, B be n-by-n symmetric matrices. Show that if YTAX = X7BX for all X, then 
A=B. 

Hint: Take X to be Keen for k 

Ws 


1,2,....mfirst and then use the sum of two such 


2.5" Multivariate Calculus 
Derivatives, Maxima and Minima 


This section assumes some familiarity with partial differentiation. The object is to 
rephrase some results about partial derivatives in matrix terms. This results in a 
considerable simplification of formulas, 


DeFINitioN 2.15 A function is said to be of class C* if all its partial derivatives of 
order n exist and are continuous, 

First consider a sealar-valued function R* > R. Writing 
Pl%ys 0. 0%y)u f has n first partials 


We collect these into a single vector and call it the derivative of f. 


Derinrtion 2.16 Let RR" —> R be a class C! at p in R®. The derivative of fat 
p. denoted Df(p), is the vector 


( 


ax, 


This derivative is also called the gradient of f. Ifp is allowed to vary, then one 
has a function Df: R" — R®, 


2.5 Multivariate Calculus Derivatives 107 


Next consider the case of a function f R" —> R", Such a function may be 
considered as m scalar-valued functions: 


IX) = (LOO LO). SX) 


where f,(X) is the k" component of the vector /(X) — that is, fy(X) = (XK 


Derinimion 2.17 Let f: R" —> R™ be of class C! at p in R". The derivative of fat 
pis the m-by-n matrix Df(p) defined by 


Df pik] = Df,(p) 
where f,(X) = fk}. 


The derivative D/(p) is also called the Jacobian matrix of f at p. 
the notation by suppressing the p's, we have 


implitying 


If we write Y = /(X), then in the two cases defined above (//R" + R and 
f-R" — R") we have (» Df) = (¥).»X. In the latter case this is clear, In the 
former case »¥ is 10 and so »Df is »X. For a function f: R  R we take Df10 
be df/dx. In this case »X is . as well, and the formula holds in all cases, 


ExampLe 2.46 Compute the derivative of the function /: R® > R®, where 
Sx.y) = (x? + y%, sin xy, = y) 


= x7 + V8, fry) = sin x94 CX, ») 


Solution fx 
so 


2x, 29) 
(1608 xy, x €08 xy) 
1-1) 
and 
2x my 
Df=|ycosxy xcosxy|  « 
I =i 


hen 


Proposition 2.20 Let f: R" > R" be the affine function /(N) = B + AN: 
Df=A.* 


Proof f(X) =(B + AXMi| = Bll] +B, Ali: ky, and afi/x,= AN 


The rules for differentiating sums and products easily generalize. 


108 Marrix Algebra 


Proposition 2.21 Let f g:R" + R® be C? functions, a a scalar. 


D(f + g) = Df + Dg. 
Dlaf) = aD) 
Df +.x 8) =(/ +.X Dg) +8 +. Df 


Proof Statements | and 2 are left as Exercises 22 and 23. Leth = f +.x g. Then 
fh: RY > R and A(X) = 3 /(X)g(X). 


=(g +.x DI +(F+-x Del = 
Exampte 2.47 Compute Df for f(X) = X +.*2. 


First Solution f(X) = 
Df(X) = 2X, 


Thus a//ax, = 2x, and so Df. R" > R” by 


Second Solution f(X) =X +.% X. By Proposition 2.20 the derivative of 
g(X) =X is the mby-n identity matrix /, so that by Proposition 2.21 
Df=(g4+.X 1) +e +.X Lor D(X) =2g(X)=2X, 


EXAMPLE 2.48 In Section 2.4 we wrote a general quadratic function 

WX) =C+BX4XTAX (A =A?) 
This form involves confusing the vector X with a single-column matrix and then 
taking the I-by-1 matrices BX and XTAX to be scalars, To calculate the derivative 
of fwe need to be a bit more careful. Thus take C a scalar, B a vector, Xa vector, 
and A = AT a matrix. Then the general quadratic function f: R" — R is 


IX) = C+ (BAX NV +X 4.K A +.K X 


The functions C + B +.x X, A +.x X, and ¥ = J +.x X are all affine func- 


2.5 Multivariate Caleulus Derivatives 109 


tions to which Proposition 2.20 applies. Thus 


Df(X) = D(C + B+.x X) + D(X 4,.x A +.xX) 
=B+((A+.xX) +.% DX) + (DA +.%X)+.XN 

H(A KN) EN 4A 

=B+2ANX (matrix-vector multiplication) 


The last step follows, since A = AT implies (A +.x X) 
24). 8 


X+.X A (exercise 


Propositions 2.20 and 2.21 are simply a matter of notation, Proofs of deeper 
results about derivatives are beyond the scope of this text. The next proposition, 
the chain rule, is proved in texts on advanced calculus. 


Proposrmon 2.22 (Chain rule) Let f2R"—» R" be of class Ch at p and 
g: R™ — R® be of class C! at /(p). Then the composite function g »f: R" > RV is 
of class Ct at p and 


(Dg ofp) = Dg Mp) +.% Dp) 


Note: If Dg is a scalar, replace +. by x. The APL default definition of 
scalar +.X vector is not wanted here 


Exampce 249 Compute the derivative of f/R"—R given by /(X) = 
(B ~ AX) +.* 2, Ba vector and A a matrix. 


Solution This is the function that is minimized by a least-squares solution of 
AX = B.Afwesetg(X) = B — AN and h(¥) = ¥ +.* 2,then f= hey, The deriv- 
ative of g is given by Proposition 2.20, and the derivative of h was computed in 
Example 2.47. Thus, 


D(X) = Dh g(X)) +.x DgiX) 
%B-AX)+.xA 


The expression (B — AN) +.X A is a Vector-matrix multiplication that was 
defined in Section 13 but not used. It may be rewritten as A"(B — AN) (exer- 
cise 24), 

The proof of the next proposition may be found in texts on advanced caleu- 
lus, as will a rigorous definition of “interior.” 


Proposition 2.23 (Max — mintest) Let fi R" > R be of class C1 and let p be a 
point in the interior of the domain of f. If f has a local maximum or i local 
minimum at p, then 


Dfip)=0 = 


110 Matrix Algebra 


Derintrion 2.18 A critical point of a C! function f: R" > R™ is a point p for 
which D/(p) = 0. 


EXAMPLE 2.50 Find the critical points of 
f(xy) = 4 + 6x + 2y + 8x? + 1Oxy + y? 
Solution fis & quadratic function, 
fin yy=4 +6 IE] +(x ate 1] 
y 5 iby 
= C+ BX + XTAN 


By Example 2.48, D(x, ») = B + 2AX. Thus the critical points are the solu- 
tions of B + 2AX 


The matrix A is invertible and 


ear 
x48 


Sale orl 
=a cal--[4] 
‘The only critical point is 27-17. 


ILis not immediately obvious if the eritical point of the last example gives a 
maximum, a minimum, of neither a maximum nor a minimum. There is a gener- 
alization of the second-derivative test for maxima and minima to the present 
context. First we need to define the second derivative. 


Second Derivatives 


In this section we will compute second derivatives only for scalar-valued functions 
Jf: R" — R. In this case the definition is clear. The first derivative D/(X) is a 
function Df: R" + R" and we define the second derivative to be 


D*f = D(Df) 


This definition implies the formula given in the next proposition, which is left as 
‘exercise 27. 


2.5 Multivariate Calculus Derivatives 111 


Propostrion 2. Let f: R" — R be of class C*. The second derivative of fis a 
symmetric matrix and satisfies 


This second derivative is also called the Hessian matrix, The symmetry comes 
from the fact that for C? functions, 


yy 


Gx, Ox, Ox, dx, 


This last fact is proved in advanced calculus texts. 


ExAmPLe 2.51 Compute the second derivative of the general quadratic function 
J(X) = C + BX + XTAX 


Solution We saw in Example 2.50 that D/(X) = B + 2AX, an affine function, 
hence D(X) =24. ao 


The proof of the next proposition, the second-derivative te: 
subject matter of advanced calculus, 


is part of the 


Proposition 2.25 (Second-derivative test) Let f: R" + R be of class 
p be a critical point of (D/(p) = 0). Assume that the Hessian matrix D'f(p) is 
invertible. Let Q(X) = X +,x D'f(p) +.x X. Then the behavior of f at p is 
given by the behavior of @ at X = 0. In particular, 


1. If Q has a maximum at ¥ =0, then fhas a local maximum at X =p. 
2. If @ has a minimum at X = 0, then fhas a local minimum at X 


3. If Q has neither a minimum nor a maximum at X = 0, then f has neither a 
maximum nor a minimum at X = p. 


If D2f{p) is singular, the test fails. 


In general the behavior of the function Q(X) of Proposition 2.25 is not obvi- 
‘ous from a casual inspection of the Hessian matrix Methods for analyzing such 
functions,will be developed in Section 5.2. Those methods are not always needed, 
however. We can, for example, use Proposition 2.25 to prove the least-squares 
approximation result, Proposition 2.15. 


Proposttion 2.26 Let A be an m-by-n matrix with linearly independent col- 


112. Matrix Algebra 


umns, B a vector in R™. The function 
f(X) =(B — AX) +92 


has a unique minimum at.V = (ATA)-!ATB, the unique solution of ATAX = ATB. 


Proof By Example 2.49 and the following remark, the critical points are given by 
ANB — AX) = 0 oF 


ATAX = ATB 


If A has linearly independent columns, then ATA is invertible, so that the only 
critical point is X = (ATA)!ATB. Since Df(X) = —2AT(B — AX) = —2A7B + 
2ATAX, we have D#/(X) = 2ATA. Thus 


Q(X) = 2NATAY 
= YAN) +.x AX 
= AX) +02 
>0 


and Q(0) = 0. Thus fhas a minimum at BUA, w 


In Proposition 2.25 Q(0) = 0, It follows that Q(X) has a maximum at ¥ = 0 
ifand only if Q(X) < 0 for all Y and Q(X) has a minimum at X = 0 if and only if 
Q(X) > 0 for all X. 


EXamPLe 2.52 Discuss the critical points of the function 


Slx9) =3 + Ax — 6y + 2xy 


Solution ‘The function /(x, ») is a quadratic function. Applying Example 2.51. we 
have 


442y 


Df(x,y) = ee ep 


vy =a[? |] 
There is a single critical point at (3, —2) and 
Ox, y) = 4xy 
Since Q(1, 1) =4>0 and Q(1, —1) = —4 <0, the function Q(X) has nei- 


ther max nor min at X = 0, hence the function /(x, ») has neither max nor min at 
point p =(3,-2), = 


26 Multivariate Calculus Linearization 113 


EXERCISES 2.5* 
In exercises | through 10 compute the derivatives of the given functions. 


Seayaxtyet 2 fixy) =e +yt 


MY 2) = ("+ yx 2) 4, flx.y, 2) = (60s x, sin y, tan 2) 
6. FRY + R" by f\N) = 3X 
FRY > R by fiX) =X 93 8. FRI RS by f(X) = B23 pV 


1 
3 
5. fit) = (cos, sin 1.1) 
7. 
9. 


FR —> Re by (XY) = IN 
10. Re R™ by f(X) =X +.% A, A an neby-m matrix 
1. Leta be a real number. Show that D(X+. +a) is axXea — 1 
12. Let @ be a real number. Show that D(X+a) is the matrix ax/n(2oX )pXea — 1, 
where / is an identity matrix. 
13. Show that D(+X) = Jx(+¥)2. «NO, where / is an identity matrix, 
14. Use the chain rule and exercise 11 to compute the derivative of /(X) = 
(B+ AN)» +a, 0 a real number. 
15. Use the chain rule to compute the deriva 


ve of (X) = In(Xs 2), 


In exercises 16 through 23 discuss the critical points of the given functions. 


16. foxy) = tot ty? 17. fly) = 1 +8 = y? 
18. fix, Veytot eng 19% fry oy 
20, fix.) = et + yt = Ast + 9) 2 fiXy= Xe 3 


2 fly) 
23. flxy) = xe" + ye? atp = (1-1) 
24, Prove statement 1 of Proposition 2.21 
25. Prove statement 2 of Proposition 2.21 
26. Let A be m by n and let m Show that 


(4.x A) = (04) 4.x F 


In particular, if A is symmetric, (VW +.% A) = 4 4.x V 
27. Let (RY + R, Show that D*/i; /] = e//8%, ox, 


2.6* Multivariate Calculus 
Linearization, Newton’s Method 


Derintrion 2.19 Let fi R" > R" be a function of class C!. The affine approxima- 
tion or linearization of fat point P in R" is the affine function 


A(X) = fp) + Df(PAX =p) 


Notice that A(p) = f(p) and DA(p) = D/(p). The function A is the unique 
affine transformation with the same value at p as f and the same derivative at p 


asf 


114 Matrix: Algebra 


In advanced calculus texts the Taylor expansion of fat p is defined. The affine 
approximation here defined coincides with the first two terms of the Taylor expan- 
sion, For real-valued functions we can write one more term in matrix nota 
The next proposition is easily checked if one has the definition of higher- 
sional Taylor series at hand. 


Proposition 2.27. Let i: R" > R be a function of class C? at p in R®. The first 
three terms of the Taylor expansion of fare 


Q(X) = fp) + DIC(PUX — p) + Hix — py D*f(pXX —p) ow 


nerally speaking, « nonlinear problem is “linearized” by replacement of 
some function / by its affine approximation A at some point p. We will give wo 
examples of this: Newton's method for approximating roots and the linearization 
of a differential equation 


Newton's Method 


Suppose that one has n nonlinear equations in n unknowns to solve. In Example 
2.35 it was shown that such a problem can be put in the form 


A(X) =0, RY Re 


Further, if (X) = B + AX and A is invertible, then we can solve /LX) = 0. 
The answer is X = —BOA. 

Newton's method seeks to approximate a solution ever more closely, starting 
with an initial (we hope educated) guess Xy. 

Given an approximation X, to a solution of UY) = 
tion is the root of the affine approximation to fat X,. 


. the next approxima- 


A(X) = f(X,) + DIX MX — X,) 
=0 
D(X, MX — X,) = —f0%) 
=X, = (PIX L%) 


= (DAX IY PX,) 
=X, —/(X,)8 DX) 


Itis shown in numerical analysis texts that X,, converges to a root of f if the 
initial approximation is close enough. 

If R — R, then the graph of A(x) = f(x,) + /“(x,)a — x,) isa line tangent 
to the graph of y = /(x) at the point (x,, /(x,)). This is illustrated in Figure 2,2a. 
Point x, ., is where this line crosses the axis. Problems with the method are illus- 
trated in Figures 2.26 and 2 


It is impractical to carry out actual calculations until the techniques of recur- 


2.6 Multivariate Calculus Linearization 15 


(om FQ) 


fa) () 


(c) 
FIGURE 2.2 


sive function writing in APL are covered in Chapter 3. The calculations for the 
next example will be found in Section 3.7* 


Exampie 2.53 Consider the system of simultaneous nonlinear equations 


These equations have been put in canonical form for easy recognition. The 
first is an ellipse with center at (0,0), x intercepts +4, and y intercepts +2, The 
second is a hyperbola with asymptotes y = +y and x intercepts +1 (see Figure 
23). Dee are four solutions, placed symmetrically around the origin: 
+V3). 


Clearing the denominators in the first equation, we get the equivalent system 


x? + 4)? — 16 =0 


116 Matrix Algebra 


FIGURE 2.3 


Thus 


2x By 
PIN = lo. | 


Starting with Xy = (1, 1), we have 


Xy = Xo — Df(Xo) (Xo) 


-fI-E al: 
[+l I 
=[i}+a0l0) 

=[2] 


Continuing in this manner, we obtain the following table (rounded off to three 
digits) (this table is computed in Chapter 3, Section 3.7*) 


0 1, —732) 

1 (5,268) 

zi (.05, 0179) 

3 | (2.00,1,73) | (6.18 — 4.9.2 

4 | (200, 1.73) | (9.29E —8, %» 
3} (2.00,1.73) | (2.178 = 15.526 — 18) 


26° Multivariate Calculus Linearization 1X7 


Ifone starts with X, 
back within about 10 
Notice that 


(1, -1), one first jumps away to (20, 15.1) and then comes 
5 of the answer in ten steps. 


1 
20xy 


(fx) a =| 


dx by, 


does not exist on the x axis (y= 0) or the y axis (x = 0) and hence the method 
breaks down here. 


ExaMPLe 2.54 We will use Newton's method to derive an algorithm for estimat- 
ing square roots on a calculator with only the four functions +, —, x, +, 

To find the square root of a is to find a root of f(x) = x* — a, Here i R > Ry 
and so Df(x) = f'(x) and the Newton iteration becomes 


1 
zo ls 225 
3. | 1.41667 2.007 
4 | 14142156 | 2.000057 
5 | 14142135 | 1.999998 


For a calculator with a single memory this calculation can be quite rapid, First 
store x, in the memory, then cycle through the sequence: a, +, memory recall, +, 
memory recall, +. 2, memory store. After each cycle, x,,, is stored in the 
memory. 


We will return to Newton’s method in Chapter 3. where it will be automated 


DIFFERENTIAL EQUATIONS 


A standard technique used for studying differential equations is linearization 
‘Suppose that a differential equation has been put in the standard form 


AM, FRR , 


118 Mairix Algebra 


All differential equations can be put in this form. (This is illustrated in the 
examples below, If F appears explicitly, however, then there are no critical 
points — see exercise 19.) 

Notice first that if ¥, is a root of the equation 


f= 


i solution of the differential equation, because 


dy — 
me = 9 =f 


Points ¥, are called critical points of the differential equation, We study the be- 
havior of the differential equation, near ¥. by solving the differential equation 


dy 
py r=ay 


where a(¥) is the affine approximation of fat Y 
Now a(¥) = /(%)) + DAY — %) = D/H MY — %). since 1%) = 0. 
Introduce the new variable H= Y — ¥, or ¥ = ¥y + H Then 


d d afy a dyed 
a qot Mas H+ ag Haa H 


and so one has 
Cb oye 
SH = D/H 


This is a /inear differential equation. A technique for solving linear differen- 
tial equations is given in Chapter 7, Section 7.6% 
EXAMPLE 2.55 Find the critical points of the differential equation 


dx z 
& = ax — x? 


dt 


Write the linearized differential equation for each critical point. 


4 Recall that if 


2.6 Multivariate Calculus Linearization 119 


Solution The critical points are the solutions of 


—0 =x, and the linearized equation is 


dh 


“7 = Pf = ah (the solution is ce’) 


—a, and the linearized equation is 


a Dflayh = —ah (the solution is ce) 


EXERCISES 2.6" 


In exercises 1 through S find the affine approximation to the given function fat the point 
indicated. For functions R" —* R and R — R" take Df to be a single-row matrix and 
single-column matrix, respectively, 


Ax ¥) = (x? + 9,7 — y®), p = (0,0), p = (1) 


1 

2. fli. y 2) = (0, 1E92). p = (0.0.0). p = 111) 
3. f(X) = B + AX. any p 

4 IN) = No.2, any p 

$. AO = (cost, sin 1.1), p = 0. p = 26 


Let /R  R® for m = 2, 3. Then the image of fis a curve in R¢ or R®. The derivative 
D{(P) is. vector with n components, and in this ease we take the alfine approximation to 
be AQ) = fp) + 1Df(p). The image of A(1) is a straight line. It is the line tangent to the 
curve at point f(p). In exercises 6 through 10 sketch the curves and the lines tangent to 
them at the point indicated. 


6 fi =A p =A) =00 
1. fit) = (cost, sin), p = flo /4) = (1) = V2 
8. flo = Hoos isin. p = fle2) 
9% fit) =(costsint. p= 0) 
10. /l4) = (cosh 4 sinh), p = /(0) 


In exercises 11 through 15 a system of simultaneous nonlinear equations is given, Using 
Newton's method, compute N, from the given Xi, (A computer is necessary for exercises 
14 and 15.) 


120 Matrix Algebra 


15. + 4 
sin (xyzw) = 0 
x+y tz+w=0 

x+ylnzw=4, %) = (1.2.3.4) 


In exercises 16 through 18 find the critical points of the given differential equation. Write 
the linearized differential equation for each critical point. 


dx 


is, 


= 


yor = 1) 


19. Write the differential equation d#x/de# = vx in vector form by taking y, % 


)y = dx/dt. Show that there are no critical points. 


20, Devise an algorithm to compute cube roots on a four-function calculator with 
memory 


CHAPTER THREE 


Systems of Linear Equations 


In this chapter we will develop the theory of general systems of linear equations. 
The tool we will use is the row-echelon form of a matrix, Until now we have been 
unable to solve systems of linear equations for which the columns of the coeffi- 
cient matrix are linearly dependent, The solutions of such systems can be read off 
from the row echelon form of the augmented matrix of the system. The solution 
technique, known as Gaussian reduction, is developed in Section 3.2, The theory 
associated with the echelon form of a matrix is developed in Section 3.3, and the 
solution process is coded for machine execution in Section 3. 

The APL necessary to automate Gaussian reduction is developed in Sections 
3.1 and 3.4, The rest of the sections are optional; they will not be needed later in 
the text. The APL techniques developed in Section 3.4 make it easy to compute 
high powers of a matrix, and in Section 3.6* we sketch two applications: linear 
difference equations and Markov chain matrices. In Section 3.7* we give some 
indication of what can be done with nonlinear equations. This section includes 
coding that makes the Newton's method algorithm of Section 2.6" much easier to 
use, Finally, in Section 3.8* we give an example of what can be done when the 
automatic Gaussian reduction developed in Section 3.5 does not work well 

In Chapter 2, two important results were left unproved: (1) the result that a 
matrix has a left inverse if and only if it has linearly independent columns (Propo- 
sition 2.9) and (2) the fact that an invertible matrix must be square (Proposition 
2.5). These statements are proved in Section 3.3. 


3.1 APL Functional Notation 


Recall the general definition of a function. 

Derinrtion 3.1 A function, f. is a rule (ic. an unambiguous procedure) that 
assigns to each element of a set D, called the domain of the function, a unique 
led the range of the function. If in R is assigned to in 


element of a set R, 
D, we write y = f(x). 


Pay 


122 Systems of Linear Equations: 


In elementary mathematics the sets D and R consist of real numbers and 
the functions are usually given by a formula such as y = f(x) =sinx or 
y= f(x) = VI — x2. We will have need of much more general types of functions 
‘and in fact have already encountered them. For example, the transpose function 
ans to each matrix Aa unique matrix AT. The domain of & is the set of all 
ind the range of © is also the set of all matrices. Another example is the 
index generator ». The domain of « is the set of nonnegative integers 
(0, 1,2,3,...) and the range is, say, the set of all vectors. 

The range of a function is to Some extent a matter of choice. The smallest 
possible range is called the image. The image consists of all » in R of the form 
y = f(x) for some x in the domain D. Thus, since A = 9.8, the image of & is 
the set of all matrices, but the image of cis much smaller than the set of all vectors. 

Procedures for defining and using a large class of functions are part of APL 
and we will make constant use of this fact from now on, 

For example, we will often need identity matrices of various sizes. Recall that 
the n-by-n identity matrix is defined by /[/: k] = 1 if = k and is zero otherwise. 
That is, /<—J0,= K where J — K un. 

It becomes tedious to go through this procedure each time. We need a func- 
tion, call it /D rather than f, for which /D(n) is the n-by-m identity matrix. This 
function is defined in APL by 


vRIDN 
[11 Ze CN ye ain 
v 


Function definitions are set off by a pair of dels (¥). The first line, which is 
unnumbered, is called the header. In the example above the header is 2-10 Ww. 
The header indicates the way the function is to be used. The function name is ID. 
the argument is N, and the result is Z. In conventional mathematical notation this 
header would be Z = /D(N). 

The numbered lines of the function — there is only one line in this exam- 
ple —define the result, Z, in terms of the given argument, NV. 

Notice that parentheses are not needed in APL. The expressions 10¢3) 
and (0 3 produce the same result 


To enter a function such as /D one types a del followed by the header. The del 
switches APL from calculator mode to function definition mode. 

The machine replies by giving the line number [1]. and the user types the first 
line. The machine continues to supply line numbers until the user types a V 
instead of a new line. The second ¥ switches the machine back to calculator mode, 
and the function js ready for use. (Procedures for correcting mistakes, deleting 
lines, and inserting lines vary from machine to machine.) 


31 APL Functional Notation 123 


Note: The function /D will be used from now on. A copy should be saved in 
your machine's workspace file. Useful functions should be added to this file as 
you work through the text. 


ExAMPLe 3.1 Use APL notation to define the following function. 


Function name: NORM 
Function input: Any vector V 
Function output: The square root of the sum of the squares of the compo- 


nents of 
Solution 
S2-NORM V 
[1] Ze(Ve22ye02 
iv 
NORM 12 3 
3.742 . 


Exampte 3.2. Use APL notation to define the following function: 
Name: TOTWO 
Input: Any nonnegative integer 


Output: S (iy 


0 


Solution 
arson 
(2a 
rom 
ve 
ee 
nite 
in 
: eee 


In more complex computations it is often convenient to temporarily store 
intermediate results. For example, suppose we wish to approximate e* by the first 


124 Systems of Linear Equations 


26 terms of the McLaurin expansion of e*: 


Es 
xt 


‘The most convenient way to compute this is 


Ke. 25 
+) OOK) TK 


An APL function that does this is 


TyeKn XK 
Th kone 

(2) Yer OGK) IK 
, 


The variable Kis called a local variable. The local variable exists only while 
the function is running, When the function finishes, the local variable disappears. 
We define the local variables by listing them after the argument, separated by 
semicolons, as in the expression 

Vv ZFON X:A.B.€ 


which is the header of a function with three local variables A, B, and C. 


EXAMPLE 3.3. In Section 1.3 we gave the computations necessary to change a 
matrix of raw scores to a matrix of Z-scores. Write an APL function, Z-SCORE, 
which takes a matrix of raw scores as an argument and returns a matrix of stand- 
ard score 


Solution 


¥ 2.2SCOHE A MEANS; SO 
[1] MEANS: (4/A)=CHAY UV 

[2] ALA= (WA) MEANS 

[9] SDRC (44A+2)6(pADET I) 2-2 
(4) 2 AR (pA) 0S 


° 5 
A 

60 120 

61 120 

62 135 

63 138 


3.1 APL Functional Notation 125 


64 130 
65 135 
86 150 
68 140 
69 170 
70 145 
71 160 
72 160 
74 160 
75 160 
76 175 


2SCORE A 
1.530 -1.570 
1.340 -1.570 
1.140 -0 675 
0 939 -0.675 
0.74) -0.973 
0.542 -0.675 
0.344 0.219 
0.052 -0 s77 
0.251 1 410 
0.450 0.079 
0.648 0 814 
0.647 0 814 
1240 0 814 
1440 0.614 
1640 1.710 


-ecces-sees56- 


This is the same answer we obtained in Section . 
‘The argument and result of a function are also local variables and are not 
confused with workspace variables of the same name. In the last example the local 
variable 4 is changed on line (2) of zscore, but the original matrix of raw 
data, also called 4, is not changed. 
The examples above are monadic functions. Dyadic function definitions are 
also permitted. The header of a dyadic function takes the form 


v 2x FON Y 


and corresponds to the standard mathematical notation z = fen (x,y). The varia- 
ble x is the /eft argument and the variable y is the right argument. Local variables 
are defined as with monadic functions. 


© 2K FON Y:A;B;0 


‘The next example gives a function that is very useful ina variety of contents 


126 Systems of Linear Equations 


Exaece 3.4 Given an interval [a, 6] and a number n, we wish to divide the 
interval [a.6] into nm subintervals of equal length with endpoints a= 
Xp <X,<Xy < +++ <A, =H. Notive that there aren + | endpoints. 


Name: cuor 

Left argument: The number of subintervals 

Right argument: The endpoints of the interval to be subdivided —a vector 
with (wo components: (a, 6) 

Result; The vector of endpoints of the subintervals 


Solution If [a,b] is to be divided into n subintervals, then the length of each 
of these subintervals is h =(b —a) +n and the endpoints are x, =a + kh, 
k=0,1,2,.-..m 

In APL notation this becomes 


KeOin 
KeAVKe (BA) N 


Hence the function: 
¥ XN CHOP AB 
(1) XeAB(1]#(0, «N) x= ABN 


v 


10 CHOP 01 
0 01 02 03 04 05 06 07 08 091 @ 


The function CHOP is useful for graphing Suppose we wish to plot 


fix) = x* — 1 on the interval [—1, 1} 
WX10 CHOP 4 4 

1°08 06 04 02 “173618 02 04 08 08 1 
bye exea 

2 “1.51 1.22 -1.06 -1.01 1 <0.992 -0.936 -o.784 -o 488 
5.2618 


(The numbers x¢6) and ¥(11) above are zero plus machine error.) These 
points can now be plotted by hand or passed on to a plotting program. 

The polynomial x — 1 has an especially simple form. For a general polyno- 
mial a slightly more elabgrate expression of ¥ in terms of Y is needed. 


EXampLe 3.5 


Name: AT 
Left argument: Vector of coefficients of a polynomial, p(x) =a, + a,x 
$ooee ayn", im the order dy.ay..- 


3.1 APL Functional Notation 127 


Right argument: Vector of x values x), Xp... 
Resul Vector of values y, = p(x,) 


Solution —y, = p(x.) = dy + ,%, + yx? + --- + ax? = Al] +.% P where 
P is the vector of coefficients and A is the matrix with /;/ component equal to x!. 
Notice that the degree of the polynomial p is “I + pP. 


TYP AT x 
(1) Yet ete pepe ae 
v 


The polynomial x" — | of Example 3.4 is represented by the vector 10 0 1. 
1001 AT 10 choP “14 


2 1.5) 91.22 “1.06 “1.0) ©) “0.992 ~0.996 0.784 0.4 
5.2618 


Notice that the function A7 also works when X is a single scalar. In this case 
Xe os 1e.9p is a Vector, ow 


Defined functions may call other defined functions without limitation, and 
this is very useful. It means that a function need be defined only once — not 
redefined in the body of other functions that may use it. 

Consider, for example, the function ZSCORE defined in Example 3.3 above. 
On line zscone {1} the average of the row measurements is computed, A 
function to compute means is useful in itself. 


TMAVE A 
PV) Me (47a) ADEN 


Similarly, a function for computing standard deviations is useful, The standard 
deviation squared is just the average of the squared differences from the mean 


vSSDA 
[11 S-(AVE(A~( pA) pAVE A) #2) *-2 
° 


The function ZSCORE may now be rewritten 
© Z-ZSCORE A 


[1] 2-(A=(pA)PAVE A)=(pA) WSO A 
v 


128 Systems of Linear Equations 


Using the same example as before: 


A 
60 120 
61 120 
62 135 
63135 
64 130 
6s 135 
66 150 
oa 140 
69 170 
70145 
7) 160 
72 160 
74 160 
75 160 
78 175 
AVE A 
67.7 146 
SOA 
504 16.8 
2SCORE A 
1.530 ~1.570 
1.340 ~1.570 
1.140 0.675 
0.939 ©0675 
0.741 -0.973 
0.542 -0.675 
0.346 0 219 
0 052 -0 377 
0.251 1.410 
0.450 -0 o79 
0.648 0.814 
0.847 0 814 
1240 0 814 
1.440 0.814 t 
1.640 1.710 


These are, of course, the same results that We obtained before. The new ver- 
sion of ZSCORE, however, is clearer, and we also have two other useful func- 
tions, AVE and SD. 


3 APL Functional Notation 129 


EXERCISES 3.1 


In exercises I through 7 a function v = f(x) is given. Define a monadie APL function with 


Argument: A vector X= (xy.---4%4) 
Result: The vector ¥ =( wy) with yy = (4) 


‘A sample table of values is given for each function so that the answer may be checked at 
‘4 terminal, if one is available. 


1 fay=l+e 2 
Ds 
3 4 
3. fx) = cos (x) + sin) 4 
a ee ee xj=3 -2 =! 0 
y lia 49 85 y Vs; Gas) ae 
5. ” 6 
Par Ro peta RTNOL ees 
y lb 99 779 y lt ot 46 
7. 


In exercises 8 through 11 a function = = fix.) is given. Define a dyadic APL function 


with 


Left argument: A yeetor X 
Right argument: A yeetor Y 
Result The vector = = (2), 


with 2, = fix) 


A sample table of values is given for each funct 
4 terminal if one is available. 


on so that the answer may be chect 


8 fiuy) =x? ey? 1 9. + 08) +10 


1 2 3 


6 


mE 


Was 1428 19.96 


130 Systems of Linear Equations 


10. 


13, 


fix.y) = sin y 608. Me fix) 
a(n es 
a 
Fill hea: ow 
“sles a 28) 2 | 7 3-195 
A triangle with base b and height Jr has area 4bh. 
(a) Define a dyadic APL. function: 
Name: AREA 7 
Left argument: A vector of bases (by. as ++ By) 
Right argument: A vector of heights (iy oh) 
Result The vector of areas 4b,h, 
(b) Define a dyadie APL function: 
Name TABLE 


Left argument: A vector of bases (by. bye. by) 

Right argument: A vector of heights (My, hys--..,) 

Result: A table (matrix) of values where the entry in the i; / position is 
the area of « triangle with base 6, and height /, 


Define a monadic APL function with 


Name: TRIGTABLE 

Argument: A vector of angles (0, 

Result: A table (matrix) of val 
cot, at the angles 04, 


Ia triangle has sides of length a, b, ¢, then the area of the 
Uriangle is Vsce — ais — Bs —c), where s = a +b +0), ee 


(a) Define a monadic APL function with = 


1.) in radians 
of the functions sin, cos, tan, see. ese, 


Name: AREA2 
Argument: The vector of side lengths (a, b, ¢) 
Result; The area of the triangle with sides a, b, € 


(b) Define a monadic APL function with 


Name: AREA 

Argument: A matrix 7 with three columns. 

Result; The vector with ith component equal to the area of a triangle with 
side lengths equal to the components of T[/:}. 


(a) Define a function for making least-squares polynomial fits to measured data. 
The function should have 


Name: DEGFIT 
Left argument: The degree of the polynomial to be fitted, a nonnegative inte- 
ger 


3.1 APL. Functional Notation 131 


Right argument: A matrix consisting of the data points. The first column con- 
sists of the x coordinates of the measured points and the sec~ 
‘ond column of the corresponding coordinates 

Result: The coefficients of the least-squares polynomial for the data, 
constant term first, (See Proposition 2.16.) 


(Computer assignment) Use the function of part (a) to do the following exercises from 
Section 23. 


(b) Exercise 7. {c) Exercise 8. (a) Exercise 10, 
fe) Exercise 11 (f) Exercise 12, 
16. Let x,,..., x, be a set of measurements with mean x, The rth moment about the 


‘mean is defined to be the number 


frig fs cant eid GY eet fe RE ed 
7 


Here, mg is the variance (my is called skewness and m, is kurtosis), Using the function AVE 
defined after Example 3.5, define a dyadic APL function with 


Name: MOM 

Right argument: ¢ 

Left argument: ‘The vector (X40. 44) 

Result m, 

Test data: For data vector 2.3 7 8 10 one has variance = 9,2, skews 
ness = ~3,6, and kurtosis = 122 


Simpson's rule is an approximation for the integral 


fords 


given in most elementary calculus books. 

Divide the interval [a, 6] into» subintervals (X <— 1 CHOP a, b), where nis even, and 
define ¥ by ¥1/] = /(XUD. Let A be (b — a)/m, The approximation given by Simpson's 
rule is written conventionally as 


Ney + Aye + ry $4 + Dy He +4) 


+ + yy + 4Yy + Mpa) 


or 
HEC, CRE py) pd 2).A)4 KIT 


17. Define a monadic APL function with 


Name: SIN 
Argument: A number x 
Result: The Simpson's rule approximation to f%,cos udu with 500 subintervals 


(Computer assignment) How closely does SIN x approximate sin (x) forx = 0, 


132 Systems of Linear Equations 


3.2. Solving General Linear Equations 


In Chapter 2 we discussed the equation 
AX =B 


under the assumption that A possessed a left inverse. In that case we saw that the 
only possible solution was given by X = (ATA)1TB or BB A. No algorithm for 
computing (A7A)-! was given for A with more than two columns, however. 

In this section we will develop a procedure — Gaussian reduction to echelon 
form —to solve AX = B for general A and to compute a left inverse when one 


exists. 
‘As an introduction to the method, consider the following system of equations 


x +22 = 4 102 4 
—x+ y+2= 1 oF =! 1 x=] 1 
x +2y +82 = 14 128 i4 


Ifwe add the first equation to the second and subtract the first equation from the 
third, we obtain 


X=] 5 
10 


aen 


Ifnext we multiply the second equation by —2 and add the result to the third, we 
obtain 


¥+22=4 102 4 
y+3r=5 of [0 1 3)x=]5 
00 0. 0. 


0=0 


‘The solutions of the last set of equations are easily described. There are an 
infinity of solutions. If z is given any arbitrary value, say 1, then the values of the 
other variables are determined. 
2: 
411-3 
1 


Does this give the solutions of the original set of equations? In this case it is easy 


x=4- 
$—3 or 


ay 


Sue 


4.2. Solving General Linear Equations 133 


to check: 
10 2) /[4 2 10 27/4 10 2)f-2 
=n fi s|+4|-3]- =! 1 tffsf+ef—t ot tff—3 
1 2 8} \lo 1 1 2 sjlo 12 8} 1 


WW 


Mit 


We do not yet know, however, that these are all the solutions, Two questions need 
to be answered: Will the method always work? What precisely is the method? 
Proposition 2.13 shows that multiplying the equation AX = B on the left by 
the matrix P will not change the set of solutions as long as P has a left inverse. The 
manipulations above can be interpreted as such multiplications. 

In the first step of the process above the first equation was left unchanged, the 
second equation was replaced by the second equation plus the first, and the third 
‘equation was replaced by the third equation minus the first 

Consider the matrix 


If we multiply the matrix version of the original set of equations by P,, we have 


10 ojf 10 2 10 oVf 4 
11 O}f—) 1 tfx=] a 4 oft 
-10 tLi28 -1 0 Wli4 
or 
102 4 
Oo 1 34 =]5 
0 2 6 10 


Further, the matrix P, is invertible with 


‘And so by Proposition 2.13 we have not changed the set of solutions. 
In the second step multiply the second equation by —2 arid add it to the third 


134 Systems of Linear Equations 


equation. This corresponds to multiplying by 


1 OAL 
lor ao 
0-21 
which is invertible with inverse 
100 
o10 
o21 


Multiplying the matrix form of the second set of equations by P, gives 
1 0 ofr o 2 1 0 of 4 
0 1 offo 1 afx=jo 1 o}f's 
o -2 iJlo 2 6 0 -2 1hto. 
102 4 
dienes 5 
00 0 o 


Thus the final set of equations has the same solutions as the original set. 
We could accomplish the procedure in one step by multiplying the matrix 
version of the original set of equations by 


1 00 
=P,P=| 1 10 
-3 -21 
which is invertible with inverse 
100 
G = (P,P = PZ -1 10 
hy ae 


The matrices P,, Py are called pivor matrices. Here is how they are used in general. 
Consider the matrix 


1-4 1 
M=|-1 21 
Oo Bit 


We wish to use the 3 in the 3:2 position to replace the 4 and 2 in the column M[:2] 


32 Solving General Linear Equations 138 


by zeros. We do this by adding appropriate multiples of row M{3;] to rows M|2:] 
and M{1;]. This operation is called pivoting on the 3;2 entry. In this case we need 


—4 times M[3;] added to M[2;] and 4 times M{3:] added to M[I;}. We can ac- 
complish this by multiplying by the matrix 
4 
ed 
1 


0 
D 
Se 


A pivot matrix is a modified identity matrix, /, in which the off-diagonal elements 
of a single column have been changed, If one wishes to pivot on the i:/ entry of the 
matrix M, then the off-diagonal entries of /[;/] are replaced by the entries of 
=M{;j] = M{i:/} with the ith entry dropped out. We obtain the inverse of a pivot 
matrix by changing the signs of the off-diagonal entries. 


EXAMPLE 3.6 Let 


Write the pivot matrices, and their inverses, for pivoting on the 2;1, 1:2, 3:3, and 
2:4 entries. 


Solution To pivot on 2:1: Starting with 1D 3 (see Section 3.1), we must replace 
the off-diagonal entries of (/D 3)f:2] by —M{:1] + M[2; Jor — 1.24 + 2 with 
the second entry dropped out. 


1 —}.0 140 
p=|0 1 0} and P4=/0 1 0 
oO =2 1 gyai 


To pivot on 1:2: This time the off-diagonal entries of (/D 3)[;1] are replaced by 


—M{;2] = M[1;2] with the first entry dropped out 
100 100 

p=| 41 0| and P4=/-3 1 0 

-4 01 4014 


136 Systems of Linear Equations 


To pivot on 3:3: 


u 


To pivot on 2:4: 


140 0 
p=|0 1 0], p+=lo 10] o 
o qt 0-31 


By multiplying by pivot matrices, an equation of the form AX = B can be 
transformed into an equation with self-evident solutions. This is because multipli- 
cation by a pivot matrix eliminates a variable from all but one of the equations of 
the corresponding system of linear equations. 


EXAMPLE 3.7. Solve the system of equations 


x, + % +2x,= 6 
3x, + xy + oy + 
OX, + Sky + Sey + 
AX, ot Dixy 4+ xy 


Solution ‘The corresponding matrix system is 


1102 6 
ene foal) $ 
9s4af*=ln 
4220 4 
Pivoting on the 1:1 entry, we have 
100 oyf1 1 0 2 10 0 olf 6 
-3'1 0 01/3 2 1 1, _|=3 1 0 ol] s 
—9 0 1 01/9 5 4 1 ~)-9 0 1 O}ftt 
-400 ifl4 220, -400 IL 4 
or 
er: 6 
o =i 1 =5],_ |=13 
0 -4 4 -17]" ~ |-43 
022 -8 —20 


4.2 Solving General Linear Equations 137 


Notice the form of the corresponding system of linear equations: 


The variable x, has been eliminated from all but the first equation, Pivoting on 
any position in the second column will eliminate x, from all but one equation, If 
we pivot on 1;2, however, , will be reintroduced into the other equations, There- 
fore we should pivot on the 2;2, 3:2, or 4:2 positions. To avoid fractions we pivot 
on the 2;2 position. In this case we first divide by —1 = M{3; 3] and then change 


signs: 
1 a xORoTe oy 22) 1 10 of 6 
9 10 offo -1 1 -s}y_Jo 1 0 o}f—13 
0 -4 1 Offo —4 4 -17]" “Jo -4 1 Of} —a3 
oO —2 0 Ijf0 -2 2 —8 0 -2 0 I-20 


or 


Now the variables x, and x, each appear in only one equation. The variable x, 
appears in wo, but trying to eliminate it from one of the equations at this point 
would put x, or x» back into another equation, We may eliminate the variable x, 
from all but one equation by pivoting on the 3:4 or 4:4 position, Pivoting on the 
3:4 position, we have 


1o ft oft o1 -3 10 1 OVf -7 

oO 1 § O}fo -1 1 =5 f 1 4 0] -13 

00 1 o}fo 00 oeig) ie, 9 
0 ilo 


3, 
0 oo 2 00 -§ 1 6. 
or 
1 oo 
G=0, 1,0 2 
0 00 3/*=]o 
0 000 0 


138 Systems of Linear Equations 


Things cannot be improved further by pivoting. The corresponding system of 
equations is 


x +% 
XX 


a oe 
Xp + Xy 


or 


solutions. Once a value for the variable x, is chosen, the 


There are infinitely many 


other les are determined. Setting x, =f, say, we have 
x= 2-1 
2 
Xy=—2-1 


{HE 


Besides pivot matrices, we will use two other types of matrices. 
A switch matrix is obtained by interchanging two rows of an identity matrix. 
is the switch matrix obtained by interchanging rows / and j of /D n, then SM 
is just M with rows / and / interchanged, Notice that $= S~ 

For example, if we interchange rows 2 and 4 in ID 4 to get S, 


10.0 Olfmg, my my <=] fot, mye ma» 
0-0-0 A] fray may may Ma Ma Mig 


webs 


My Myr Mss Ms, Msy May 
my Me May My, May May 

A multiplier matrix is obtained by replacing a diagonal entry of 1D n by 
nonzero scalar. If L is a multiplier matrix with A = L{i; 7}, then LM is M with 
Mlii] replaced by AMUi;}: 


10 Olfmg, mats my Whit tag) Mma 
0-1 Olfimgy tng mag oe [=| may mgs may == 
0-0 ASL mye mys eb [Amyy Amy Arygos 


A multiplier matrix is invertible. If L is defined by Li: (| = 
multiplier matrix with A~4in the i position. 

We will refer to pivot, switch, and multiplier matrices collectively as elemen- 
tary matrices. The inverse of an elementary matrix is an elementary matrix of the 
same type. 

Gaussian reduction uses elementary matrices to reduce a system of linear 
equations to a standard form. The standard form we will use is called the echelon 
form. 


. then L~ is the 


32 Solving General Linear Equations 139 


Derinirion 3.2 A matrix E is said to be in row-echelon form if 


1. The zero rows of E, if any, are at the bottom; that is, no row of zeros has a 
nonzero row below it 


2. The first entry in each nonzero row is a 1. These I's are called leading ones and 
the columns containing the leading ones are called pivot columns, 


3. The leading 1 is the only nonzero entry in a pivot column. 


4. The leading 1 in a given row is to the right of the leading 1’s in the rows 
above; that is, if Eli; /] and E(k; /| are leading I's and i <4, then j</. 


Identity matrices and zero matrices are in row echelon form. The matrix 


1 o10 
Die 0) 
O00 S: 
0 000 


from Example 3.7 is not in row echelon form because the first nonzero entries in 
rows 2 and 3 are not 1's. If we multiply by the appropriate multiplier matrives, we 
can get an echelon form 


10001 0 o1o] fio 10 
01000 “ 0 0 off-1 1 of Jo 1 -1 0 
00400 00 3)-]o0 o18 
00010 ooo} 00 00 


The pivot columns are 1, 2, and 4. 

In row-reducing a system of equations AX = B, we will not carry the whole 
equation along. We will just work with the partitioned marrix 4,8, which is 
usually written [4B]. By the definition of matrix multiplication we have 
(M 4. N)k] = M-+.% NGA} This means in particular that 


PIA|B) =(PA\ PB) 


Instead of multiplying the equation AX = B by the elementary matrix P, we 
can multiply the partitioned matrix [4B] by F. 

The product of the elementary matrices used to reduce the system can be 
useful. We will keep track of this product by using the partitioned matrix [4| B |), 
where J — /D(pA)[I]. Then, as we multiply by elementary matrices, we have 


PA\ BI] = (PA) P,B Py) 
PAP,A\P,B) P,| = [P:P\A\ PyP,B) PsP\) 


and so on, and the last block accumulates the products of the elementary matrices. 


140 Systems of Linear Equations 


The complete description of the algorithm as we will use it is: 


Gaussian reduction Given the matrix equation AX = B, form the partitioned 
matrix [4|B) J], where / is an identity matrix with the same number of rows as A. 

By multiplying by elementary matrices, obtain the partitioned matrix 
[E| F/G}, where £ is in echelon form. The matrix G is invertible, and multiplying 
AX = Boon the left by G gives EX = F. The solutions of EX = F are determined 


by Gaussian reduction. Use G to check the arithmetic. 


Solution Begin with the partitioned matrix 


1-1 -2;-5)1 00 0 
—2 3 S| 90100 
1-2 -3} 0]o 010 
0 -2 -21 2l0001 


Pivoting on 131, 


10.0 o]f 1 -1 -2)=s)1 0 0 0 
210 of/-2 3 Ss] 90100 
—1 0 1 ol] 1 -2 -3} ofo 01 0 
000 iff o -2 -21 210001 
1-1 -2)-5) 1000 
CE Te See Toye 
0 -1 -1] s}-1 0 10 
o -2 -21 21 0001 
Pivoting on 
1100 if-1 -2)-s} 1000 
Oo 100 Of 1 1}-1] 2100 
Oo 1 1 0 Of-1 -1} s}-1 010 
0 20 1 ofl-2 -21 21 of 01 


32 Solving General Linear Equations 141 


10 -1/-6/3 10 0 
Oto ill —1 | sect 0 G0 
“jo 0 oO} 4J1 110 
loo «(ol ol4 201 
=|[E FG) 


The equation EX = Fhas no solution. Indeed the corresponding system of equa- 
tions is 


Ox +09 +0 


4) 


and the third equation is not satisfied for any values of x, ) = 
To check the arithmetic using G, one multiplies AX = B on the left by G. If 
the result is not EX = F, then an arithmetic mistake has been made. 


GlajB}=[3 1 0 Of 1 -1 -2)-5 
2 1:0 O})-2 3 5 9 
1 1 1 ol] 1 -2 -3! 0 
420 iL 0 -2 -21 2 
10 -1)-6 
(ie st dad 

Si]00; 50) > 

oo ol oO 


GA is in echelon, 
B for different 


Once an invertible matrix G has been found for which £ 
form, then this G may be used to solve equations of the form A 
matrices B. 


Exampte 3.9 Solve 


i = -1 


ete a _| 4 
1-2 ~|-3 
0 -2 -4 


Solution The matrix on the left is the same as the matrix of Example 3.8. Thus 
we multiply the equation AX = B above by the matrix G computed in Example 
3.8. We do not need to perform the computation GA, betause we know that 


142 Systems of Linear Equations 


GA = E. Multiplication on the left by G gives 


10-1 10 olf-1] ft 
Onan a 10 of} 4}_ jz 
oo of*=h1 1 4 off-3]=Jo 
oo 0 20 I|-4 0 


‘The corresponding system of equations is 


pope 


The variable = can be chosen arbitrarily and then the other variables are 
determined. Say = =; then 


x=l+r 


=! 


The above pattern is a general one. The variables corresponding to nonpivot 
columns of E can be chosen arbitrarily. Once these variables are assigned values, 
the variables corresponding to the pivot columns are determined 

Suppose, for example, that in solving a system AX = B, the partitioned ma- 
trix (4) B] has been reduced to 


10002 0/5 
ge BOP EF AL 8 OG 
IFI=19 9 0 1 4 0|7 

00000 118 


There are four leading I’s in this echelon form in positions 1:1, 2:2, 3:4, and 4:6. 
The pivot columns are thus E[; 1 2 4 6), Notice that £[:3] is not a pivot column, 

ince it does not contain a leading 1, Let X = (xy, Xa. Ny. Xqs Nyy Ng): The nonpivot 
columns of E are E|; 3 Sand xs, x, may be chosen arbitrarily, say x, = 1, and 
X5 =a, The system of linear equations corresponding to [E|F] then becomes 


x, +2 =5 
Xo tly + 3 
Xyt 4tp=7 


xp=8 


32. Solving General Linear Equations 143 


which gives 


x, =5—2y $ 0 —2 
4 1-3, 6 =1 = 
“ 0 1 0 
i or Kal o/+n] g(t 4l_y 
- 0 0 1 

) 


In the last form, notice that if B is changed in the original equation, only the first 
term of X is affected. The terms involving the arbitrary parameters are un- 
changed. If B = 0, then F = GB = 0, and the first term of X disappears. This 
means that the terms involving the arbitrary parameters are the solutions of 
AX =0. The vectors multiplied by the arbitrary parameters are linearly inde- 
pendent (exercise 67). 


COMPRESSION, TAKE, DROP 


We need some notation for deleting entries from vectors and also for deleting 
rows and columns from matrices 

One way to delete an entry from a vector is to use compression, The compres- 
sion function is denoted by /. The form is L/¥, where Vis the vector and Lis a 
vector of zeros and ones (such vectors are sometimes called logical vectors or 
masks), The entries of V corresponding to the zeros of L are deleted, 


13 101/123 delete second component 

2% 010+ 128 delete first and third components 
1115123 delete no components 

123 4040123 delete all components (leaving (0) 
000/129 

° 


The logical vectors 1. are usually created by using the logical or relational 
functions 


2443 
Another selection function is take, denoted by 1. The expression 37 means 


“Take the first three components of V and ~3+V means “Take the last three 
components.” The function drop, denoted by 1, works similarly. 


ans 


144° Systems of Linear Equations 


“3105 
456 
aus 
45 
aus 
12 


If isa matrix, then L/A drops columns and LA drops rows. The expression 
121A means “Take 1 row and 2 columns” and 0 ~44.4 means “drop 4 columns 
from the right end of the matrix.” 

Now consider the definition of the pivot matrix P needed to pivot on the 1;3 
entry, say, of a matrix A, 

Start with an identity matrix with the same number of rows, N, as A: 


RID N-\tpA 


Next we need to change the column P[;1}. Each entry but the first should be 
replaced by the corresponding entry of —A[;3] + All: 3} 


PEC EOND ENA Tete N) ALS 8 J-AT1 9) 


We can now write a function that performs the pivot operation on a given 
matrix. The right argument will be the matrix being row-reduced and the left 
argument will be a vector giving the pivot position. 


vo 2K PIVOT A PLN 
[1] PetD Nett pa 

[21 PEON T AON) OME XEA De OCLY TAN) AL E21 OALX EY LZ TY 
[3] 2ePe xa 


° 
tad 89 
tas 
456 
769 
22 PivoT A 
6.00E-1 3.47618 6 006-1 
4 0060 5.0060 6 00c0 
6.0061 1396-17 >6.00E-1 


Notice the round-off error in column 2 of the result. The precise form of this 
column will vary with the computer used, Since the 1;2 and 3:2 entries are less 
than {icy times the entries of A, we may take them to be zero. (Recall from 
Section 2.2 that Ger is the fuzz factor, here taken to be about 10-1.) 

A function that interchanges rows X[1] and X[2] of the matrix A is 


3.2. Solving General Linear Equations 145 


v 2K SWITCH A 
Wy AD IeALT2 115) 


12) 2A 
v 
A 
3 
ck 6 
ie 2 
1 3 SWITCH A 
789 
a5 6 
‘ered 


Notice that a switch matrix need not be created. Any function that performs 
the same task as multiplying by a switch matrix will do, The third kind of elemen- 
tary matrix is the multiplier matrix. We need multiplier matrices only to create 
Jeading 1's when the first nonzero entry of a row is not equal to I, For our 
purposes, then, it is sufficient to have a function that will put a I in a given 
position, the // position, say, by dividing the ith row by the 1:j entry, Call the 
function LDR for leader. 


v 2X LOR A 
UD ADE TALENT PALA IZ) 
12} 2a 


ooe- 


2 LOR A 
1000 2.000 3.000 
0 000 1.000 1.200 
9000 8.000 9 000 


Note: Copies of the functions P/VOT, SWITCH, and LDR should be saved 
in a workspace file for future use, 


ExampLe 3.10. Solve the system 


f x+2y¢32= 
4x + Sy + 62 
Tx + 8y + 92 


by row reduction. Check your answer. 


146 


Systems of 


Solution This is of the form AX 


A 
4 
Cire 
Form the pi 
M (A.B 
123645 
45670 
89 8 0 


Linear Equations 


B with 


wartitioned matrix [4]B\I] = M. 


10 


‘ 
° 
' 
° 


If We use the seven to clean up the first column, we get 


Mica 1 PIVOT M 
1.79618 8.5761 1.7160 4.86€0 1.0060 00060-14361 
0.94618 4.2061 8.57E1 2.4360 0.0060 1 000 -$.71E-1 
7.0060 8 .00€0 9. 00E0-8.00E00.00E0 0 00€0 —.00EO 
The entries MI{1 2; 1} are small enough to be assumed zero, We wish to reduce 
A to echelon form, so that 

tied) LOR 1 3 SWITCH MI 
1.0060 1.1460 1.2920 1 140 0.0060 0.0060 1.436+1 
6.94618 4.2961 8.5761 2.4360 0.0060 1.0060 5 71-1 
1.73618 8.57E1 1.71E0 4. B6EO 1.0060 0.0060 ~1 426-1 
Next pivot on the 2;2 entry and set the leading 1. 

iWa-22 LOR 2 2 PIVOT M2 
1000 = -1.73E18 1.0060 5.3360 0. 00€0. 2.8701 67EO 
1.62617 1.0060 2.000 © $ 67EO~=—=0.00E0 ©2330 “4.3360 
121647 8.87E19 1.736 4.18617 1.0060 2.0060 1 00e0 


Assuming the small entries to be zero, A has been reduced to the echelon form 


10 
Oat 
00 


We check this using the matrix @- 


42. Solving General Linear Equations 147 


-G-3 3 TMS. 
9.000 -2.670 1.670 

8.000 2 330 -1.330 

1000 2.000 1.000 

Gr xAB 

10060 0 000 ~1 0060 5 3360 
2.78617 1.000 2 0060 5.67£0 
6.94618 1.39617 1 396-17 4 168-17 


Taking the small entries to be zero, A is in echelon form, and so we assume that 


the calculation is correct, 
There is one nonpivot column, £1;3}, Seting 


5.33 


" 
W 


) 


5.67] +1) — 


4, we have 


It is instructive to see just what happens in the last example if the entry 


M313.3) 1.73 18 is assumed nonzero. Ther 
setting the 3:3 entry equal to 1, we get 


+Ma-33 LOR 3-3 PIVOT M3 


$0060 5 00-1 0.0060 1 a7E1 5 76617 1: 5E18 
1.406% 0.0060 = 0-000 4 23E1 1 15818 
7.0060  § 00E-1 1. 00€0 2.4061 $7617 115618 


Clearly something is wrong, because the first two columns 


form, The check also shows that something is wrong. 


+63 -31Me 
S.76E17 -1.15E18 —§-76E17 

15E18 «2. 31E18 1: 15E18 
S.76E17 1.15618 5. 78E17 

Gr wAB 

2.000 4000 2.000 20.000 
8.000 0.000 0.000 32.000 
4.000 8000 8.000 24.000 


2 91618 


have lost 


. pivoting on the 3:3 position and 


5 76617 
1 15618 
5 16617 


their nice 


This matrix is not in row echelon form. Therefore an error has been made 
We will use G as a check even though the problem is apparent in w4. We 
need G for other reasons, and other codings of P/VOT can produce reasonable- 


looking matrices ma (see exercise 66). 


148. Systems of Linear Equations 


Machine Computation Check If GA is close to an echelon form, accept the result. 

‘The term “close” is not precisely defined and can depend on several factors, 
stich as experimental error in measured data, For the most part, in this text E 
is close to an echelon form if one gets an echelon form by setting equal to zero 
those components of E which have an absolute value smaller than the entries of 
axder. 


Inverses 


Gaussian reduction can be used to compute left inverses. If a matrix A is row- 
reduced to £ = GA and E has no nonpivot columns, then a left inverse for A may 
be extracted from G. 

First an observation about matrix multiplications. It follows immediately 
from Proposition 2.1 that (M+. N)li:] = Mlés] +.9¢ N. Thus if we take the 
partitioned matrix M-P{1JQ. which is often written 


we have 


where / is an identity and 0 is a zero matrix, If A and hence E are square, the 
matrix 0 does not appear and & = I. Suppose that / isk by & and let G, = Gck:} 
Let Gy be the remaining rows of G. Then 


[pJ-#-=[h= [24 


hence G,A = J, Gy = 0. In particular, G, is a left inverse for A. 

It follows from Propositions 3.3 and 3.4 of the next section that this procedure 
will always work; the echelon form of a matrix is unique and if 4 has a left 
inverse, then E has only pivot columns 

For this application of Gaussian reduction we do not have an equation 
AX = B but simply a matrix A. Form the partitioned matrix [A |] and row-reduce 
to [|G]. If £ has only pivot columns, then a left inverse may be extracted from G. 
If E has some nonpivot columns, then A does not have a left inverse. 


32 


EXxaMpte 3.11 Does the matrix 


1-1 -1 
—2 3 1 
‘| |e a 
0-2 2 
possess left inverses? 
Solution Begin by pivoting on the 1;1 entry: 
100 O}f 1 -1 -1/1 00 
210 0//-2 3 Ilo to 
-1 01 0/]/ 1 -2 1001 
000 tl o -2 21000 
1-1 -1] 1000 
Sy Oletee—ti|) 2 ino. '0 
“Jo -f 2/-1 010 
0-2 2/0001 
Next, pivot on the 2;2 entry: 
110 olf! -1 -1) 100 
010 ojf0 1-1} 210 
O11 0/0 -1 2/-1 01 
020 1J[0 -2 21000 
10 -2/3 100 
_jo 1 -1]2 100 
S 0008 ih ye 
00 ol4201 
Finally, pivot on the 3;3 entry: 
10 2 off o -2)3 1 00 
o 11 offo 1 -1/2 100 
0010/00 1110 
000 iffo0 ol4 201 
10 0/5 320 
0, P20} 2) eb] 
=lo.0 aft t 1 0] = IA 
00014201 


Solving General Linear Equations 


150 Systems of Linear Equations 


100 320 
010 210 
E oot 110 
000 4201 
and 
Si sy aN0 
G,=|3 210 
Dd i 
is a left inverse for A. 
Check 
$32 0][f 1 -1 -1] [lo 0 
“ FQ UV O—2 3 Dl yo wo 
i teriecn| ip seeing opel] = 
o -2 2. 


The process above does not produce the left inverse (ATA)"!4T computed 
by G. Using the matrix of the last example, in fact, 


A 
(etn! 
Par rian) 
eat 
o-2 2 
8a 
0.047 0.524 2,000 -1 240 
0.047 0.476 1 000 -0 762 


0.143 0.429 1 000 0 286 


The left inverse G, computed above is not unique. A different choice of pivots 
can produce a different G,. The algorithm used by §, however, does not involve 
pivoting. It reduces linear systems by using the elementary reflections defined in 
Chapter 5. The resulting algorithm is less sensitive to roundoff error than 
Gaussian reduction. 


EXERC. 
13-1 2 
ne, 40 a 
coerce Or 
Do a —de 2 


42 Solving General Linear Equations 151 


Write the pivot matrix to pivot on the given position. 


LAL entry 2 1:2 entry 3. 251 entry 
4. 24 entry 3. 3:4 entry 6 4:3 entry. 


Is the given matrix in row echelon form? If so, identify the pivot columns. 


10 0] 10 0 
zjoo41 8 ]0 00 
0 0 oO. 000 


1 
0 
4 12030 
0 00140 
10. 23 2 
0 pei A0-y 4 2100000 
0 00001 
12 0300) 
or 1010) oo -1 400 
af 4. Jo 001 Joo 0010 
0001 00 0001 


16. Show that if £ is in row echelon form, then 1 14 £ is in row-echelon form 


Without computation, write the inverses of the elementary matrices given in exercises 17 
through 25. 


100 
7 Jo 10 
002 


901 0 a 

001 Patan 100 

2, jo 1 0 21. 2, jo 1 0 

10 0. ae O07 
0100 

102 1-400 Ans 0:00] 

23, fo 13 ufo 100 as, Jo 1 0 of 

oot jo 310) o11o 

ip rot o101 


Solve the linear systems in exercises 26 through 35 either by hand or using the functions 
PIVOT, SWITCH, and LDR. 


2. xt y+e= 9 7 x2 
det teal = = 
dy+z=10 x43) +3: 

2. 9 +5 452=33 29, 3x, + xp +2%y + 5x, 
x4 ytr=5 =X, + 342 + 


3x + dy +22 = 12 3x, 
2x, + 2x, 


182 Systems of Linear Equations 


30. xt et Ht = 1B 3h. 2x, tart oy + BKy 
Attet BP = UW Dey + 4g + 25 + 12x, 

=x, — 2x = &,=-17 x, +2 + x5 + 6x, =18 

xy + y+ 2x + B= 25 2x, + 4% + 4x, = 10 


33.0 x + 2y +32 
Bxt¢3y43e 


32, 3x, + 6x2 — 2xy — 2 
Ky + 2vy + ayy + 18K, 


3x, + 6x, — Ky + y= det the 
xy t2tp— xy — 2K 4etBv4 2 
-te= y+ 
Mode + Oy +32 = 45 
y+ 213 
W+se=at 
x+Sy +22 =25 
teat 
38. y= te Ny Ny 


Ay + 3xy + Sy + Ty 
= 2x, + 3xy + Sty + Thy + 9% + Xe = 


my = 2x, — dx, — 6x) + 2 
—2e, + Xp — 4y— Ixy — Stet xg= 
Ky — ky = 4Xy — ON, — Bxy — 2xy = —30 


In exercises 36 through 44 use Gaussian reduction to compute a left inverse, if one exists, 
of the matrix 4, either by hand or using the functions P/VOT, SWITCH, and LDR. 


Pot =4 
38. d=|-2 -1 6 
12 =5, 


1-1 2, = 2 0 
1-1 3 '-t 0 a ; 
Oe 0) =2 2301 a ok 
9%. A=|-2 1-3) 4 A=]-1 2 2] 41. A= 1 2 
-!-1 0 1-1 -1 Fen 
1 =3 8 2 o - 
_ —l =2. 
1 1 “i 10 =-2 -1 1-2 5 -4 
WEN 38 = Si ee | 2 see 
42. |-2 -4 -16 43. vi -3 -4 44 o -2 5s -2 
0-2 -6 -—1 1 o-t io=2 4-3. 
-2 -4 -16 


In exercises 45 through 49 use SWITCH, PIVOT, and LDR to compute the echelon form 
of the given matrix. Check your answers. 


47 50 244 30) BO ae 

45. [55 83 362 64 = [2 Ses 
9 17 18 oe ea tS 
446 636-523 341 


32° Solving General Linear Equations 183 


57 99 «S188 
=145 


9 3 4 =) -15 30 
47. | 355 97 494 94 3 —30 gy enna, oe 


6 32 2 
59-67 —69 = 
69 IL ~150, erase 


—11 353 4350 a7 = 48-97 
-8 -61 -81 67 84-1 
-79 77 91 932 420688 

67-491 607 409-192, —295, 


49. 


Compute by hand the results of the expressions in exercises 50 through 65. 


30. v3 1627 Slo wa1627 52. va 1627 
Qe (BeeV) on Vier ov) 0M 
53. Wet 2 6-4 “5 34. Wet 26 4 5 55. Vei20 
(vey iv ceoyry ((Ver5 )AvS5) Vv 
36, And 319 ST. Ana 3009 38. And 3y19 
141tA Va 24a 
59. And 3018 60. Ae3 318 lL And apo 
0 24a 2 21a 221A 
62. An3 3919 63. Ae3 319 64. ACs ayo 
0 31 ora 3 RA 
65. And 309 
01a 


66, (Computer assignment) « version of PIVOT that does not create a pivot matrix is 
Vo Z-x PIVOT? AE 
CO) ext 4e (9A) 
[2] Z-A-CEAL.X(211) © ALENT: FALAODDX12) 
v 


To pivot on / this version creates a matrix whose rows are the proper multiples of 
row Ali;] and then subtracts this matrix from A, 


(a) Redo Example 3.10 using P7VOT? instead of PIVOT. How do the new matri- 
ces MI, M2, MB compare with the M1, M2, M3 in Example 3.10? 


(b) Create M4 by pivoting M3 on the 3;3 entry. Is M4 an echelon form? 


{c) If Ma were an echelon form, then M4l;4] would be a solution of the original 
system. Is M4f;4] a solution of the original system? 


(d) Use the new G to check the accuracy of M4. 


67. In solving AX = B, one row reduces [4] B] to an echelon form £. The solutions are 
then written in the form 


X = oy + ty + fade +o + hath 


or X = vy + PT, where P[;/] =v, and Ti] =4,. There is one parameter 4, for each 
nonpivot column of E. Suppose that the nonpivot columns of E are £[;1']. Show that 
PIV: 1100 


and then apply Proposition 2.11 to show that the columns of P are linearly independent. 


154 Systems of Linear Equations 


3.3. The Echelon Form of a Matrix 


‘The importance of the echelon form of a matrix is this: The echelon form of A is 
an explicit list of the linear relationships among the columns of A. 
Consider a typical echelon form: 


040 
050 
160 
oot 
00 0. 


The pivot columns are E{; 1 2 4 6}. The nonpivot columns of £ are linear 
combinations of the pivot columns — the preceding pivot columns, in fact, Let 
V = (1,2,4,6). Then 


E(;3] =2£[:1) + 36 [2] = Eft IP] = ELV 
3 


EUS] = 4E(:1) + SE(:2) + OE (4) = EG 2 41/4] = ER If4 


5 3 
6 6 
0. 


Further, the pivot columns are linearly independent. This follows easily from 
Proposition 2.11, which states that the columns of 4 are linearly independent if 
and only if the equation AX’ = 0 has only the trivial solution X = 0. For, letting 
1 = 1D 4, we have, if E[:VIN =0, 


Sane Dyas PS ek 
O= ELVIN d =|> 
Bir ng rll al 
hence 0. 


Now, suppose that £ was derived from A by row-reduction —that is, 
GA, where G is a product of elementary matrices. Since G is a product of 
elementary matrices, G is square and invertible, If G = P,P,_,... Py, where the 
Pare elementary matrices, then G-! = PytP also a product of elemen- 


tary matrices, and A = GE. It follows that the columns of A[; V] are also linearly 
independent, 
Ir 
ALVIN =0 
then 


GALVIN 


33° The Echelon Form of a Matrix 155 


or 
EU:vx =0 
and so X = 0. Further, 


Al:3] = (GE)L;3] = GE ;3] = GOEL] 


=AGVIP2] = ald 21/2 
3 b] 
0 
0. 
or Al3] = 2Als1] + 34:2). 
Similarly, 
AUS} = 4Al1] + SAL:2) + 64:4] 


We record these facts for future reference. 


Prorosttion 3.1 Let A be row-reduced to matrix in echelon form. Let V be 
the vector of indices of the pivot columns of E (in ascending order), Then 


1 The columns of 4[;V/J are linearly independent, 
2. Given a column index, let Wb 
j (ie, W—(V <J)/V) and let C be the first pW’ components of 
C —(pW)t Ef). Then 


the components of ¥ less than or equal to 
1 (ie. 


Aly 


=A WIC 8 


This proposition will apply to all cases, such as E a matrix of zeros, if some 
conventions are observed. 

In the case that £ is an m-by-n zero matrix, we take to be (0, the vector 
without components. Then £[;/] and A|;V] are m by 0 and represent an empty 
set of vectors in R®. We consider an empty set of vectors to be linearly indepen- 
dent, We also consider matrices without rows or without columns (i.€., ov. =A) to 
be in echelon form. 

These conventions are not arbitrary; rather they are, for the most part, vari- 
ants of the convention that a sum over an empty index set is zero (0-+/10, see 
exercise 50 in Exercises 1.2 and exercise 8 in Exercises 2.1), Such conventions 
avoid the separate consideration of special cases and are quite helpful in coding 
functions for machine execution. : 


154 Systems of Linear Equations 


3.3 The Echelon Form of a Matrix 


The importance of the echelon form of a matrix is this: The echelon form of A is 
an explicit list of the linear relationships among the columns of A. 
Consider a typical echelon form 


coo 
coun 
conus 


The pivot columns are E{; 1 2 4 6}. The nonpivot columns of £ are linear 
combinations of the pivot columns — the preceding pivot columns, in fact, Let 


V =(1,2,4,6). Then 
E(;3) = 2E[:1) + 36:2) = EU ae 
3 


(sl) + SE(:2) + 6E(:4] = E 


Further, the pivot columns are linearly independent, This follows easily from 
Proposition 2.11, which states that the columns of 4 are linearly independent if 
and only if the equation AX = 0 has only the trivial solution X = 0. For, letting 


1 = ID 4, we have, if E[;V |X = 0, 
Dy = (2X) 1% 
(o}*= [or] = [0] 
hence X = 0. 


Now, suppose that £ was derived from A by row-reduction —that is, 
E = GA, where G is a product of elementary matrices. Since G is a product of 
elementary matrices, G is square and invertible, If G = P,P,_, ... Py, where the 
P,are elementary matrices, then G~! = Pj!P3!.... P-4, also a product of elemen- 
tary matrices, and A = G~1E. It follows that the columns of Al; V] are also linearly 
independent. 

it) 


= ELVIN 


ACV 


then 
GALVIN 


33° The Echelon Form of a Matrix 158 


or 
EL.vX =0 
and so X = 0. Further, 


Al:3] = (G-*E)3] = G1£[;3] = GEL VIP 


3 
0 
0. 
=AGV IP] = Ab 2172 
3 [3] 
0 
0, 


or Al;3] = 2Als1] + 34f:2). 
Similarly, 


AU;5} = 4Als1) + SAL:2) + 64:4] 


We record these facts for future re 


a matrix in echelon form, Let V be 
in ascending order). Then 


Provosition 3.1 Let A be row-reduced to 
the vector of indices of the pivot columns of 


1. The columns of 4[;V] are linearly independent, 

2. Given a column index, let W be the components of V’less than or equal to 
j (ie, W—(V <J)/V) and let C be the first pW components of Efi/] (ie. 
C —(pW)t Ef). Then 


A= AL WIC ow 


This proposition will apply to all cases, such as E a matrix of zeros, if some 
conventions are observed. 

In the case that £ is an m-by-n zero matrix, we take ¥ to be 0, the vector 
without components. Then £[;¥] and 4[;V] are m by 0 and represent an empty 
set of vectors in R®. We consider an empty set of vectors to be linearly indepen- 
dent, We also consider matrices without rows or without columns (i.e,, oY. =») 0 
be in echelon form. 

These conventions are not arbitrary; rather they are, for the most part, vari- 
ants of the convention that a sum over an empty index set is zero (0-+/.0, see 
exercise 50 in Exercises 1.2 and exercise 8 in Exercises 2.1), Such conventions 
avoid the separate consideration of special cases and are quite helpful in coding 
functions for machine execution. : 


156 Systems of Linear Equations 


Exampte 3.12. Extract a linearly independent set of vectors from the set 
{(0.0.0), (2,1, 1), (0, 1, 0), (4, 5.2)) 


and express the other vectors as linear combinations of the linearly independent 
set. 


Solution Set up the vectors as columns of a matrix 


The pivot columns are E[;1 2}, and so a linearly independent set is 4[; 1 2] 
or (2, 1, 1) and (0, 1, 0). From E[:4] we see that (4, 5, 2) = 2(2, 1, 1) + 3(0, 1,0) or 


v2 9 


ALV It x2 3 
452 


or we can apply the formulas of Proposition 3.1: 


We( Ved) Ov 
(pW) TEL Ay 
ALWi> xe 

462 


The other vector, (0,0,0), can of course be expressed as (0,0,0) = 
0+(2, 1. 1) + 0+(0, 1, 0). Proposition 3.1, on the other hand, says that it should be 
4 linear combination of the elements of 4[;V] that precede it. The set of such 
vectors is empty, but the formulas still apply. 


att) 

00 0 
We(vet iV 
ow 


The Echelon Form of a Matrix 187 


pW) TEL St) 
oc 


ALWI+ xe 
o 0 0 . 


From Proposition 3.1 we immediately have 


Proposition 3.2 
1. Any set of m + 1 vectors in R" is linearly dependent. 


2. If'e,.....u, are linearly independent vectors in R", then every vector in R” is 
a unique linear combination of v,,.... Uy 


Proof Let the n + 1 vectors form the columns of A and row-reduce to an echelon 
form E. The matrix £ ism by n + 1, and, since two leading 1's cannot occur in the 
same row, there are at most 1 pivot columns. Hence there is at least one nonpivot 
column. This proves statement 1. For statement 2 let uv, = Alii] and 
v = Al; 2+ 1}. Then the first n columns of E are pivot columns and the last is a 
nonpivot column and hence a linear combination of the pivot columns. If we 
interpret E as the echelon form of the system Alzu]X’ = vu, We see this solution is 
unique. 


EXAMPLE 3.13. Show that every vector in R*is a linear combination of the vectors 
b, = (1, 1, Dy Ue = (1,2, —1), and vy = (2,3, 1). Write the vectors u = (19, 30, 4), 
v = (23,36, 5), and w = (27, 42,6) as linear combinations of 04, Us. & 


Solution Imitating the proof of Proposition 3.2, we make v4, Up. Uy. ty Uy W the 
columns of a matrix A and row-reduce the matrix to echelon form, By the meth- 
ods of Section 3.2 we get 


5 -3 -1 If 1 2 19 23 27 
GA=| 2 -1 -1 1]| 2 3 30 36 42 
aoe le DLW tl 4s 8) 36. 
AO mOT Ab aie: 
=|0 1045 6/=E 
oo1789 


Thus Al;1 2 3] are three linearly independent vectors in R', hence every vector 
in R¥ is a unique linear combination of v,, vs, vy, In particular, 


w= vy + 4, + Tey 
v = 2v, + Svs + 8v5 
w = 3u, +60, +90, 8 : 


158 Systems of Linear Equations 


Now we wish to prove a basic result: a matrix has one and only one row 
echelon form. To prove that every matrix has an echelon form we must show that 
the row-reduction process works. Since anyone who has row-reduced a few matri- 
ces knows that the process will work, a formal proof of the fact may seem a waste 
of time. There is, however, a practical reason for writing down a proof. The tasks 
of proving that row-reduction works, describing the row-reduction process in de- 
ail, and coding row-reduction for machine execution are closely related. 


Proposition 3.3 Every matrix may be row-reduced to a unique row echelon 
form. 


Proof By the term “row-reduce A” we mean “multiply A by a sequence of ele- 
mentary matrices, G = P,P,_,... Pj, So that £ = GA is in row echelon form.” 

If A is @ zero matrix, then it is already in row echelon form, and G is an 
identity matrix that we consider to be an elementary matrix (it multiplies rows by 
1). IFA # 0, then there is a first column, A[;k] say, which is nonzero, Suppose that 
Ask] #0. Let P, be the switch matrix that interchanges rows | and r, let P, be 
the pivot matrix for the 1:k entry of PA, and let P, be the multiplier matrix that 
sets the 1;k entry of P,P,A to 1, Let A, = PyPP,A. 

We have now completed one cycle of the row-reduction process and have the 
following setup with = 1 and p =k. 


(i). The first p columns of 4, are an echelon form with I nonzero rows. 


(ii) Ifthe matrix B,, obtained by dropping the first /rows and p columns from 4, 
is a zero matrix, then 4, is in row echelon form. 


To check statement (i) notice that the only nonzero entry of A,l:.4] is | in the 
1;k position, So the conditions of Definition 3.1 certainly hold for A[l: A. Fur- 
ther, if B, = 0, then A,[1;] is the only nonzero row of A,, and so the conditions 
hold for A, as well, 

Now suppose that conditions (i) and (ii) hold for some /> 1. If B, #0, we 
carry out another cycle of the row-reduction process, Suppose that B,[:k] is the 
first nonzero column of B,. Suppose that Bi{r; k] # 0. Use P, to interchange rows 
1+ Vand / + rin 4,, use Py to pivot on the / + 1; p + kentry of P,A,, and use Py 
to set the / + 1: p + & entry of P,P,A, equal to 1, giving A,,, = P,P,P,A,. 

Condition (i) holds for 4,,, with / replaced by / + 1 and p replaced by p + k. 
First, since B,{;A' is the first nonzero column of B,, A), ,{: (p + k)] has only + 1 
nonzero rows and 4), + 1; (p+ k)]is zero except for a 1 in the last position, 


This 1 is the only nonzero entry in its column. Since the entries of 
All: p+ k — 1)j have not changed, (i) is true. 

If By, =0. 4), {0 + Ls] is the last nonzero row and 4,,, is in row echelon 
form. 


Recall that a matrix without rows or without columns is a zero matrix. Thus 
we will get B, = 0 when we run out of rows or columns, if not before, and so the 
process must end. The matrix 4 has then been row-reduced. 


33° The Echelon Form of a Matrix 159 


Next we must show that there is only one echelon form. That is, if E, = G,A 
and E, = GA are in row-reduced echelon form, where G, and G, are products of 
elementary matrices, then E, = Ey. Notice that we do not assert that G, and Gy 
are equal. During each cycle of the row-reduction process one has a choice of 
pivots, and different choices produce different G’ 

Now A = Gj'E, = G3'E;, and so E, = G,G; "Ey, Since G, and G, are prod- 
ucts of elementary matrices, so 100 is G,G;1, and so E, may be row-reduced to 
and similarly £, may be row-reduced to £,, This means that we may apply Propo- 
sition 3.1 with either one of the matrices £, as A and the other as E. 

If £, # Ey, then there is a first column, the kth say, at which they differ. 

Now £,[;k] and £,{:k] cannot both be nonpivot columns, because then, ac- 
cording to Proposition 3.1, they would be linear combinations, with identical 
coefficients, of the previous columns in which £, and £, do not differ. If one, say 
E,{:k), is a nonpivot column and E,{;k] is a pivot column, then E,{;k] is independ- 
ent of the previous pivot columns, and so, by Proposition 3.1, E,[:k] must also be 
independent of the previous columns. But if £,l;k] is a nonpivot column, it is not 
independent of the previous pivot columns, The matrices E, and E, may be 
exchanged in this argument. Thus, we cannot have one of £,[;k] a pivot column 
and the other a nonpivot column. The only remaining choice is that both E\(:k] 
and E,l;k] are pivot columns and hence can differ only in the position of the 
Jeading 1. But if E,{; (k — 1)) = E,l; dk — 1)] has r nonzero rows, then this 
leading | must appear in the (r + I)st row in both matrices, This contradiction 
shows that Ey = Ey. = 


Now we are in a position to prove Propositions 2.5 and 2.9. The next proposi- 
tion includes Proposition 2.9. 


Proposition 3.4 Let A be a matrix. The following statements are all true or all 
false. 


1, The matrix 4 has a left inverse. 
2. The row echelon form of 4 is [4]: where J is an identity matrix, 
3. The columns of A are linearly independent, 


Proof \twas shown in Section 2.3 that statement 1 implies statement 3. Suppose 
that statement 3 is true, Let E be the row-echelon form of A, E[:1/] the pivot 
columns. Now if there were a nonpivot column E[;/}, then [;/] would be a linear 
combination of the columns of 4[; J. This cannot be if statement 3 is true. Thus 


has no nonpivot columns. That is, E = (iI: 
We saw in Section 3.2 that if statement 2 is true, then statement 1 is true, In 
fact, if GA 


= Ab then Gil 4 pA:] is a left inverse for A, w 


The next proposition includes Proposition 2.5. . 


160 Systems of Linear Equations 


Proposition 3.5 Let A be a matrix 


1. An invertible matrix is a product of elementary matrices. In particular, an 
invertible matrix is square. 

2, If A is square and has a left inverse, then A is invertible. 

3. If A is square and has a right inverse, then A is invertible, 


Proof 
1. Suppose that 4 is invertible, By Proposition 3.4 there is a product of elemen- 


tary matrices G such that 


Since G is square, A and 


(c) 


have the same shape. Thus we wish to show that the 0 does not actually appear. 


Multiplying on the right by A”! gives 


3 = GAA" A ot 
G=GAA Ag 


Now G is invertible, and so 


GGt= [4 |e 


So the rows of zeros do not appear (they do not appear in /) and A1G-! = J — 
that is, A = G-!, But Gis a product of elementary matrices and hence A is also. If 
G = P,Py_ y+. Py then G-! = Py3P31,.. Pe 


2. Since A has a left inverse, there is a product of elementary matrices G such 


that 
= 
o=[5] 


by Proposition 3.4, and since A is square, the rows of zeros do not appear. Thus 
GA =I, hence A=G"'GA=G"'1=G"! and hence A is invertible with 
At =G, 

3. Statement 3 follows from statement 2 by taking transposes. 


The echelon form of a matrix is unique. Thus no matter how we row-reduce A 
to E, we always obtain the same number of pivot columns. 


3.3 The Echelon Form of a Matrix 161 


Derinttton 3.3. The rank of a matrix is the number of pivot columns in the row 
echelon form. 


The rank of A is the minimum number of columns needed to generate all the 
columns of 4 as linear combinations. 

In the examples that follow and in the exercises at the end of this section a 
number of row-reductions must be performed. These computations are tedious if 
done by hand or even using the functions SWITCH, PIVOT, and LDR defined in 
Section 3.2. In Section 3,5 the advanced function-writing techniques discussed in 
Section 3.4 are used to define two functions, GAUSS and ECHELON, The expres- 
sion G — GAUSS A defines G to be an invertible matrix such that £ = GA is in 
row echelon form. The expression E — ECHELON A defines £ to be the echelon 
form of 4. We will use these expressions in the examples below to briefly indicate 
that a row reduction has been performed. 


Exampte 3.14 Find a linear independent subset of the set of vectors 


(3, 5, —2, 4), (9, 15, —6, 12), (2, 1..0,0), 
=I, =4,2, —4), (2,8, -4,8), (0,2, —1,2)) 


Express the other vectors as linear combinations of the independent vectors 


Solution Store the vectors as columns of a matrix 


A 
CV ei ieee et 
Bede: Ox a Hk 
ase, Ue =a) 4 
ac go ior <ay fe) a: 


and reduce to echelon form: 


ECHELON A 
100060 300060 «0.0009. «1, 000E0 ©= 2.0000 939E 18 
2.776E-17 6 939 17 1.00060 1 000EO 2.00060 6. 9396-18 
3460-17 -5.S51E-17 0.00060 2.0826-17 4. 163E-17 1.0000 
0.0000 «0. 000E0 -~—=«0. 000€0 «0. 000ED © 000EO 0 00060 
313 6} 


Assuming the small numbers to be zero, the pivot columns are £| 
‘Thus an independent set is 


A: 3 6)= 


162 Systems of Linear Equations 


The display merely indicates that (0 three significant digits 


con 
1 


9 3 
1s : esl |e 
16 | = A21= Sab + 0 Alea] + OAL) = | 
12 cu 
eI 


fas Al) = —Als}] + AL:3] + 0+ ALO] = 


° 


om 
i 
=) 


3 
5 
=2 
4 

2 Bez 80 
8 5 
44| = AS] = 24b 1 — 2463] + 0° ALO) = | 
8 4 


where vectors have been written as column vectors. Carrying out the computa- 
tions shows the equations to be exactly true. 


Many superficially different problems can be reduced to the kind of computa- 
tion done in the last example, As an illustration of this, we will show how to give a 
rigorous meaning, for linear equations at least, to such often-heard statements as 
“Two equations in three unknowns leave one degree of freedom” and “We have 
four equations and three unknowns, hence the system is overdetermined and has 
no solution.” 

To avoid confusion, notice that although we begin with the 


familiar system 
AX=B 


we do not now reduce the augmented matrix [A | B]. Rather we now reduce the 
matrix [4 | BJ’. This is no way to solve the system of linear equations, but it does 
yield the theoretical result we are after. 

First, notice that we have been treating individual linear equations of the 
form 


yxy + datz + ax, = 6 
as though they were vectors. In the row-reduction process this equation becomes a 
row vector in the augmented matrix [4 | B]. It is multiplied and added as though 


it were the vector 


(ay, ay, yb) 


in R™*, 


3.3 The Echelon Form of a Matrix 163 


We say that a linear equation is a linear combination of other linear equations 
if the statement is true for the corresponding vectors. A system of linear equations 
in which some equations are linear combinations of the others is called redundant, 
If the equations are independent, the system is irredundant. 


ExaMete 3.15 From the set of linear equations 


—400x, + T6xy + 67x; — S41xy + 219s, 


=285x, + 35x, + 40x, — 240x, + 10x, = 135 
155x, + 83x, — 9x, — 673x, + 157K, = 424 
—102x, + 68x, — 34x, + 272x, + 170x, = 306 


extract an irredundant set. Express the remaining equations, if any, as a linear 
combination of the irredundant set. 


Solution We set up the corresponding vectors as columns of a matrix 4. Notice 
that A is the sranspose of the partitioned matrix used 10 solve the system. 


A 
460-285 155 ~102 
76 35 83 68 
67 40 -9 -34 
S41 240 -673 272 
219 110 157 170 
313 135 424 306 


Reduce this matrix to echelon form: 


ECHELON A 


1 000E0 © 0000 5 23060 2. 6026-18 
6.939E 18 1. 0000 8 98660 4 o41e-17, 
SADIE 19 4.0666 19 5 963E°-18 1 0000 

1.949616 3.4696 17 4 025E°16 1. 3886-16 
1110616 5 ’ 1 
5 ° 2 5 


SS1E17 


S51E 17 ase 16 
5516 17 


9716-15 
0060 2206 16 
The pivot columns are E[;1 2 4] and, to three 
Els] — 8.986 E[;2). 

Thus equations 1, 2, and 4 form an irredundant set and equation 3 
imately, 5.230 times equation 1 minus 8.986 times equation 2, 


gnificant figures, E{;3] = 5.230 


approx- 


The next proposition is a fundamental result about the rank of a matrix. The 
proof must wait until the concept of a vector space has been developed. 


Proposition 3.6 The matrices A and AT have the same ranks 


164 Systems of Linear Equations 


Assuming 3.6 is true, we have 


Proposition 3.7 

1. Any system of n + 2 linear equations in n unknowns is redundant. 

2. An irredundant system of n + I linear equations in n unknowns has no solu- 
tions, 

3. An irredundant system of & linear equations in unknowns with n < k has 
solutions, If n =k, there is a unique solution; otherwise the solutions involve 
n— k arbitra meters. 


Proof Given k equations in n unknowns, we have k vectors in R"*!. If 
k > n+ 1, these vectors are linearly dependent by Proposition 3.2, and this 
proves statement 1. 

Assume now that the k equations are irredundant and are written in the 
matrix form AX = B. Then the partitioned matrix [A | B] is k by n + | and the 
matrix [4 | BJ? has rank k. Thus by Proposition 3.6 the matrix [4 B) has rank k 
This means there are no rows of zeros at the bottom. Ifk = n + 1, this means the 


echelon form of [4 | B] is 
[FH] 


and hence the original system can be reduced to a system containing the equation 
0 = |. This proves statement 2. If k < 7, then we have solutions, and there is an 
arbitrary parameter for each nonpivot column. 


EXERCISES 33 


In exercises 1 through 5, (a) show that every vector in R* is a linear combination of the 
VECLOFS Uy, Uys s Uyi (b) Write the Vectors wi, +, aS linear combinations of 
Op Ucn 


0, =2, 3), wy = (0.2, —3), 3 = (—2, 2,1) 

A vy =(-2,3,1), v2 =(—2.2, Dh v5 ia) 

=1,2.0, We = (2. =D, wy = 3, —4, =) 

= (=2.1, 1,0), vg = (1,2, -1.3), 5 = (1.0.0, 1, oy 
3, =3,2, -2), wy = (1, -2, L, =I). ws = (3, =H, 1,0) 


0.1, -1.1) 


33 The Echelon Form of a Matrix 165 


In exercises 6 through 10 compute the rank of the given matrix. 


6 fl 0 oO i =" 8 Le Ay) 
Ties : 2 -| 1) Wet ay 3 
yi lo 61-2] -2 1-1 -1 

1-2-4 5 

9. to a 10 10 10 po 
Eareersu 108g! 
ieee dl 
=—6 3 =3\ =I 


In exercises 11 through 15 find a linearly independent subset of the given set of vectors and 
express the other vectors as linear combinations of this subset, 
teu, 


2. y, 


(1. =), by = 2. 4), vy = 0.1), 4) = GB, -2) 
2, =. vs = 4, =2), 05 = (-1. DY = 2D) 


13. 0, = (1,0), vy = (1,0, ~2), vy = (~2, -1, -2) 
04 =(-3, -2, -2), 05 =(=1 =1, ty =2,-2) 
14, oy =(-2,-1, -2), vy = 3.1.2), oy = 65 
04 = (7.3.6). 0s = (1,0. 1) Ug = (91,6) 
15. uy (0.1.1, =I, 05 = (2, -6, -4, -2) 


Y= 


For exercises 16 through 20: (a) Extract an irredundant set of equations from the given set 
(b) Without solving the irredundant system, state whether it has no solution, a unique 
solution, or an infinity of solutions, If the latter, how many arbitrary parameters are there 
in the solutions? 


16. 


1% 6x— 
4x +7 
ax + 


18 


lx, + x2 — 4x; — 5%, 
=23x, + Sxp— 7x3 + 2% 
Wy Gt 
Sp at ck: iy 


21. Show that a rank | matrix A can be written as V's. x Wfor suitable vectors Vand W, 
Hint: Write A = G-1E. How many columns of G-' are relevant? 


166 Systems of Linear Equations 


22, Show that 4 and B have the same echelon form if'and only if there is an invertible 
matrix F such that B = FA 

23, (Computer assignment) If Proposition 2.11 is to hold when 0 = 1 tp, then the 
solution of AX = 0 must be the “zero vector” «0. Set A — 10 (pO and compute BA and 
(10p0)84 at a terminal 


3.4 Branet 


In this section we discuss some advanced APL function-writing techniques. The 
object is to develop enough APL to enable us to write a function ECHELON that 
will take a matrix for an argument and return the echelon form of the matrix. The 
‘material on branching and comparison tolerance below will allow us to do this in 
Section 3.5, The material on recursion is not needed for 3.5 but will be needed in 
later chapters. 


g and Recursion 


BRANCHING 


In a complex process, such as row-reduction, not all operations are known in 
advance. For example, one does not know where the third pivot column in a 
row-reduction will occur until the first and second pivot operations have been 
performed. A function to row-reduce a matrix must be able to take different 
actions for different matrices, This is the purpose of branching or GO TO instruc- 
tions in computer languages. 

Let us look at a simple example. In applications one often encounters 
“piecewise” functions — functions that are given by different formulas over dif- 


0 ifx<o 
fay fs itx>0 eu 


For example, a circuit element might give no output for negative voltage 
input and a nonlinear output for positive voltage input. 

An APL function capable of applying one formula when x > 0 and another 
formula when x <0 could be used to compute a function such as fix). In APL 
such functions are written using the symbol —», called branch or GO TO. 

The expression 


in an APL function means “GO TO line 3 instead of the next line.” Now — is not 
quite a normal APL function, but it obeys the same rule as a monadic APL 
function: it operates on the result of the expression to its right. For example, 


“010123 


34° Branching and Recursion 167 


means “GO TO line 2.” This means that one can jump to different lines, depend 
ing on circumstances. Before looking at examples of this, we need to be more 
Precise about the operation of the monadic function —. 


1. The right argument of — is a scalar or vector. 

2. 0 or “GO TO line zero” means “The computation has been completed, 
return the result.” Nonexistent line numbers have the same effect. Im is not a line 
number for the function, then —sn is the same as 0. 

3. —w0 has no effect. The function just passes on to the next line as if the —+ were 
not there. 

4, If Vis a vector but not «0, then —*¥ is the same as + V[1]. The other compo- 
nents of V, if any, are ignored. 


For example, 
+( <0) 10 


will be 0 if X is less than zero and —»0 if X is not less than zero. Thus the 
expression reads “Stop if ¥ is less than zero, otherwise, go on,” and the function 
/(%) of Equation (3.1) can be written using branching as 


vr x 
1 20 
[2] (x<0)/0 
(3) 2x2 

Ui 


Some further examples are given below. 


EXAMPLes 3.16 Use branching to define an APL function 10 compute 


fa ifx<l 
x 
fey a fxe>1 
Solution 
vZEX 
yy zx 
[2) Gero 
13) zexe2 


v . 


168. Systems of Linear Equations 


ExampLe 3.17 Use branching to define an APL function to compute 


0 ifx<0 
fixy=j\x  if0<x<I 
1 ifx>t 
Solution 
vZFX 
Ui) 20 
[21 0X50) /0 
13) Zex 
14) 0519/0 
15) 1 
¥ . 


EXAMPLE 3.18 Use branching to define an APL function to compute 


if0<x<landO<y<1 


(9) 
FO= | yd — xX —y) otherwise 
Solution 
EK FY 
(1) BO 
[21 a(x) ACKENDACDEY) AVE ND 0 
[9] ZeXuYe( 1x) xToY 
v . 


In none of the examples above was a jump to an actual line number used, 
The “stop or continue” form used above suffices for most simple situations, A 
version of Example 3.16, the version most beginners would write, can be used to 
illustrate branching to a line number: 


VDF x 
OO) 4061/4 
(2) zxe3 
(3) +0 
(41 Zexer 

v 


(This version first checks to see if X¥ > 1 or ¥ <1 and then applies the appropri- 
ate formula. As the given solution to Example 3.16 shows, it is usually simpler to 
assume one case —for example, X <1 —apply the appropriate formula, and 
then change to another formula if necessary.) 


34° Branching and Recursion 169 


Expressions of the form (x0) /« are often inconvenient, because editing the 
function may cause the line numbers to change. Then each line containing @ —> 
must be checked to sec if it is still valid. We can avoid this problem by labeling 
statements. A preferred version of F is 


vzrxX 

[1] =(%21)/P0s 

(2) zx 

13) +0 

[4] POS:2-x+2 
v 


‘The line ,40s-z-x+2 contains the label pos. A label is a variable name begin- 
ning a line and separated from the rest of the line by a colon (:). It is simply a 
special type of local variable that does not need to be declared in the header and 
whose value is always the line number to its left, If this last version of Fis edited 
and line numbers are changed, the branch statement on line [1] need not be 
changed. 


Comparison Tolerance (Fuzz) 
Machine computations are not exact, Numbers that should be zero are merely 


small. 


1-3-3 
1. 734723476E-18 


The display above will vary from machine to machine, but it will rarely 
precisely zero. On the other hand, one would like an expression such as | 
fo test out to be true anyway, 


aee3 


‘This is accomplished by allowing the function . (as well as >, <, and so on) to 
ignore small enough differences. The size of the differences ignored is set by the 
“system variable” Qcr, Whose size in turn varies with the particular machine and 
can be changed by the user. To find out its value for a particular machine one 
simply types Gor. The number fcr is called the comparison tolerance or fuzz. It 
can be thought of as the largest number for which the expression 


tetsDer 
is “true.” 


s=1-0¢r 


170 Systems of Linear Equations 


For example, if der is 10~4, then 1 will be considered equal to 1.0001. 


Qer-1e-4 
1=1 0001 


er may take any value between 0 and 1, 0 included and 1 excluded. 

More generally, 4-6 will return a | as long as |A — B) is less than axOer.t 
Where Mis the absolute value of 4 or the absolute value of B, whichever is greater 
(Mel 1A,8). 

If ase returns a 1, we say that A is fuzzily equal to B. 


EXAMPLE3.19. Set Qor-1€-4. Which of the following pairs of numbers are fuzzily 
equal? 


(a) 1 and 1.0001. (b) 100 and 100.01. 
(c) 10% and 10-8. (d) 0 and 10>, 
Solutions 


(a) IfAcy and es 0001, then B ~ Ais 10°, Mis 1.0001, and ses(er is thus a 
bit more than 10~', A and B are fuzzily equal, 

(b) 4-100, 8-100 01, B — A is 10%, Mis 100.01, and so msOcr is a bit larger 
than 10-8, A and B are fuzzily equal 

(©) Ax 00001, 8 000001, and A — B ix 000009. M is 10-5, and so msOcr is 
10-®, which is less than 9-6, A and B are nos fuzzily equal. 

(A) 4-0, a-1€ 100, and BA is 10°. M is also 10-!, and so Mader is 
10°, which is less than 10°". 4 is not fuzzily equal to B. 


Example 3.19(d) illustrates a general fact. The expression oa returns a | only 
if A is precisely zero, Thus 1<1fcr returns a 1, but o.Der returns a 0 

We will confine our use of fuzzy equality for the most part to a single form. 
When we wish to test if the quantity B is negligible compared to the quantity A, 
we will use the expression 


AaB 


Derintrion 3.4. We say that B is negligible compared to A if A is fuzzily equal to 
A+B, 


We have used the setting Qer-1€-4 for illustration only. The fuzz should 
almost never be changed from its default value. This value varies from machine to 
machine and is set to the value thought to be the best choice for the majority of 
computations, 


# This is not quite true. We are ignoring complications that arise from the machine representation of 
numbers 


34 Branching and Recursion 171 


RECURSION 


Mathematical induction is a concept that is particularly important for linear alge- 
bra. Proofs and definitions often proceed by induction. For example, the proof of 
the existence of the echelon form of a matrix (Proposition 3.3) was an inductive 
proof. Inductive definitions are often called recursive, especially in a computer 
science context. An APL function that is defined by mathematical induction is 
called a recursive function. 

A simple example is the factorial function, written !n in APL notation and n! 
otherwise. The definition for n an integer is 


al=n(n— Nin —2)+ 2-2-1, a >0 and O=1 


or X/un. 
This function can be defined inductively as 


fn) if ifn=0 
LOE fn—V itfn>0 


Such definitions have a striet form consisting of two parts 


1. The specification of the starting value. 
2. The expression of the nth value in terms of previous values. 


The APL version of the function /(”) is 


© Z-FACT N 

yz 

(2) =(N20)/0 

(3) Z-NxFACT N-t 
. 


The specification of the starting value is on line {1}. The function continues only if 
n> 0. On line [3] the function is called again with a smaller value of 7. Without 
the starting value the function would call itself forever —or until it filled the 
workspace. 

The above function is not necessary in APL, since the factorial is monadic ! 
However, no primitive function will give us the nth power of a square matrix, 
Notice that we could define A" as 


1 ifn =0 


az Asa? ifn >0 


where J = 1D 11 pA. 


im 


fems of Linear Equations 


v ZA TOTHE N 

Ib 21D 1Tpa 

12) +(0=N)/0 

[8] ZA+. 4A TOTHE N-1 
5 


We will show in Section 3.6" how powers of a matrix arise in applications. First, 
however, we give some more examples of recursive functions. 
EXAMPLE 3.20 Compute x5 if 
{! ifn at 
x, x, 
Sd (1 - 1) ifa>t 


Solution 
vRFN 

py get 

{2} (Net) 0 

(9) zr wt 

(4) Zezerezee 
v 


F 80 
0 04662 . 


EXAMPLE 3.21 Let x, be defined as in Example 3.20, Compute the vector 
(xp Nyse Nyy) ANd the vector (X4ye 6. Xso) 


Solution Write a function VF N that computes the vector x,, 


vo ZEN 

i 

[2] (Naty 00 

3} ZevF Nt 

(4) Zez,2(Meryer-2(Nea}-2 
. 


ve 10 
10,5 0.375 0.9047 0 2583 0 2249 0 1998 0.1797 0 1636 
0.1502 a8 
Ive 50 
0.04489 0 04388 0 04292 0.042 0.04112 0 04027 9 03946 
0.03868 0.03793 0 09721 0.03852 


On line |3} of the solution ve w-1 oF (xy. x: 
is just (xy, 
quence. 


+%,-1) is computed, Then ve w 


ss Xqop Ayal — %,-1/2)). X, appears to be a decreasing se- 


34° Branching and Recursion 173 


A more natural coding for the function F in Example 3.20 would be 


V2FN 

ty) ezen 

[2] (Net) 0 

13] Ze(F1 Not )KIH CFT WH1)-2 
v 


This function will work, but Fl costs more to use than F; because it involves 
more function calls and hence more computation. 

How many more function calls? Let us let e(n) be the number of times F is 
called to compute Fn and cl(n) be the number of times F'1 is called to compute 
Fin. 

First we get a formula for c(n). This is not particularly hard, e(1) = 1, because 
F stops on line [2}. If 1 > 1, then Fn involves the original function call, and then 
on line [3] we get e(n — 1) more function calls. Thus 
{! ifn=1 
T+om—) ifm>1 


en) = 


Thus e(1) = 1, (2) = 1 + 1 = 2, (3) = 1 +2 =3, and e(n) =n, 
Next we calculate cl(m). Again cl(1) = 1. For el(n) we have the original 
function call plus 2e1(n — 1) function calls on fine [3]. Thus 


ifn=t 


Kn) = 
hg cle = 1) ifn >I 


A formula for ¢1(n) is not so obvious, but we can get an idea of how it grows ¢ 
enough. 


ily 


vzver iW 
ty Zt 
(2) (Neyo 


3) Zvor wor 
[ap Zz, 1426Z(N-1] 
o 


ver 20 
493 7 15 91 63 127 255 511 1023 2047 4095 8191 16380 
92770" 65540 131100 262100 524300 1049000 


Thus to compute yo the function F uses twenty function calls and the func 
tion F1 uses more than a million. i 
A formula for cl(n) may be found using the methods of Section 3.6*. 


174 Systems of Linear Equations 


EXERCISES 3.4 


Use branching to define APL functions to compute the functions given in exercises | 
through 8. 


1. y ites} a (ax ifixj<t 
$9)= (00 ites) 4 VE=T if|s|>1 
3 x-1 ifxg=t 4 ae 
fy =|-Vi- if-1<x<t 
x-1 ifx> DAVES SOE 
5 xiiamigx se news 6 yy ok? ifixj+ <1 
-xifn—1<x<n, mod Y= (a2 + 52 otherwise 
7 fy axe +4? bay <9 8 _/0 fe + 721 
{I= 19 otherwise fo = (=a omnerwise 


For exercises 9 through 17 assume that DoT. 12-4 


9, Are 10,000 and 10,001 fuzzily equal? 

10. Are ~10,000 and ~10,001 fuzzily equal? 
11, Are 2 and 2.0001 fuzzily equal? 

12, Are 1 and 1.0002 fuzzily equal? 

13, Is ~.1 negligible compared 10 10,000? 
14, Iy ~.1 negligible compared to 1000? 

15, Is] negligible compared to 1000? 

16. 1s 10° negligible compared 10 101% 


17, The set of points fuzzily equitl to 10,000 form an interval [a,b]. Find a and b. Is 
10,000 in the center of this interval? 


Hint: Write 6 = 10,000 +» and find an upper bound for « 


(Computer assignment) In exercises 18 through 25 compute x). Sy... Xy for the given N. 
1B. x, =2 19. x, 
Xq = hat — Xp Fn = Xe + N51 
N=7 N 
20. x, =1 20 oe 
x 
N 
2 Bi Tx 
N= 
24, 25. x 
Xy = 5, 
N=12 
=f 9996 0002 J» 
26. Compute | ‘done 10003 for n = 2000; 10,000; 100,000. 


35° Automating Gaussian Reduction 175 


3.5. Automating Gaussian Reduction 


If A is a matrix, then there is an invertible matrix G such that E = GA is in 
echelon form. In this section we will define a function GAUSS such that 
G = GAUSS A and a function ECHELON such that E = ECHELON A. 

Machine arithmetic is not exact, and during row-reductions the errors may 
sometimes accumulate in such a way as to produce a totally erroneous answer. 
For this reason our automatic version of Gaussian reduction will have an error 
check built in, the same error check we used in Section 3.2. To compute the 
echelon form of a matrix A we form the partitioned matrix [A | /] and row-reduce 
to [E | G), where E is an echelon form. The function ECHELON will not return 
E, however. The function ECHELON will return GA. If GA is “sufficiently close” 
to an echelon form, we accept the answer, If itis not close enough for our liking, 
other methods must be tried. These methods involve numerical analysis beyond 
the scope of this text, If A is a small matrix, however, then step-by-step use of the 
functions SWITCH, PIVOT, and LDR will often reveal the problem. 

Since Gis an invertible matrix, GA has the same echelon form as A (exercise 
22 of Exercises 3.3), and a second application of the functions will occasionally 
produce a better-looking answer. 

The main function to be defined is GAUSS. In fact, echelon is just 


1 Z-ECHELON A 
[1] 2-(GAUSS Aye xA 
° 


PARTIAL PIVOTING 


In row-reducing small matrices by hand, one usually pivots on whichever entry 
involves the least work — the entry that involves the simplest fractions, for exi 
ple. In automatic processes, however, one attempts to choose pivots in such a way 
as to minimize the arithmetic errors committed by the machine. The strategy We 
will employ is called partial pivoting. In partial pivoting one uses row interchanges 
(ie., the function SWITCH) to bring the entry of largest magnitude (absolute 
value) to the pivot position. (Full pivoting involves column interchanges as well.) 

For example, suppose that after the first column of A has been cleaned up, 
one has 


1 100 30 
0 1 6]. 
0 14 -20 
0-30 7 


Row-reducing by hand, one would now pivot on the 2;2 position. Employing the 
partial-pivoting strategy, however, one first switches rows 2 and 4 to put the 


largest (in absolute value) available number into the pivot position. The entry in 


176 Systems of Linear Equations 


the 1;2 position is larger, but of course it is not available — using it would disturb 
the form of the first column. 

Partial pivoting is not difficult to code into a row-reduction function. One 
must search for a nonzero pivot anyway, and so take the largest available magni- 
ude to pivot on. 


DYADIC + 


‘To implement the partial-pivoting strategy we must be able to locate that compo- 
nent of a vector with the largest absolute value. Finding the largest absolute value 
in a vector is not difficult, If Vis a vector, then | V gives the absolute value of each 
component, and a reduction using the maximum function will pick out the largest 
entry: [/| V. 

This is not quite what we want, however. It gives us the size of the desired 
pivot quantity but does not tell us what row the quantity appears in. We can find 
the row index by using the equality function and compression,+ but it is simpler to 
use the dyadic function 1, called index-of. The left argument of « must be vector V; 
If the right argument of « is a scalar A, then Vid gives the index of the first 
occurrence of A in V; If A does not occur as a component of F, then Vid is 1 + p¥. 


Va as 
3 
Geretrord 
2 
12a 
4 
If A is an array other than a scalar, then the action is componentwise. 
A 
1 
3 
12300 
14 
an) 


Thus, to find the index of the (first-occurring) component of largest absolute 
value in the vector V we may use the expression 


weriv 


Next we ask just what is meant by “zero” in the machine version of row- 
reduction. Consider the row-reduction: 


FIP LAD / (44, then 14 (L=12¥) se 14 pV is the row index 


5 Automating Gaussian Reduction 177 


A 
AA 
LI 
a9 
+A-t 1 LOR 1 4 PIVOT 1 3 SWITCH A 
1.00 1 160 1.360 
69618 4361 8.661 
1.7618 8 .6E1 1.760 
+An2 2 LOR 2 2 PIVOT 2 3 SWITCH A 
1.020 1.7618 1.060 
2.0618 1.060 2.00 
B.1E18 8. 7E19 176-18 


(These numbers will vary with the brand of computer used.) 

We want our Gaussian reduction function to stop at this point, because the 
ast row is “small enough” to be assumed zero. The definition of “small enough” 
we will use is negligible compared to A, In Section 3.4 we defined a number C to 
be negligible compared to B if the expression @-a+c is true in the sense that the 
result is a 1. (This depends upon the value of Jer.) 

Of course A is a matrix, not a single number, and so we will use the largest 
magnitude in A to set the scale, In the case above the largest magnitude in the 
original A is 9. The expression 9-94 shows which components of the current A 
are negligible compared to 9. 


GAUSS 
We are now in a position to define the function GAUSS. It is 


© GGAUSS A (S/P;L.T.B:VKGR 


1) Stet 
(2) Peteo 
(3) Tend 


(4) AAD tT 
15) CYCLE-v. [CL.PyaTea 

16] CC p¥)<Ke(S4S*¥) 41) SEND 

(7) BEEK] BLK) 

[8] Ad((Le1) Ps )LOR((L¥1),P*K)PIVOT(L+1,R)SWITCH A 
{9} <CYCLE. (L-L+t) , PLPHK 

[10] END-G(-T[1 11)TA 


178 Systems of Linear Equations 


The function closely follows the proof of Proposition 3.3 — the proof that the 
row-reduction process works. In that proof at the end of the /th cycle we had a 
matrix 4, such that 


(i) The first p columns of 4, are an echelon form with nonzero rows. 


(ii) If the matrix B, obtained by dropping the first / rows and p columns from 4, 
is a zero matrix, then 4, is in row echelon form. 


In the function GAUSS the matrix A is the augmented matrix [4, | G;,}, Where 
G, is the product of the elementary matrices used for the first / cycles. We obtain 
the matrix B from B, by replacing each component by its absolute value. 

On line [1] the scale is set for fuzzy comparisons. The scalar $ is the largest 
absolute value in A. The numbers / and p are initialized to zero on line [2}, and A 
is augmented by an identity matrix on line (4) 

According to the proof of Proposition 3.3 we look at B,. If B, = 0, we are 
done, If B, #0, we must find the first nonzero column of B,. Lines [5] and [6] 
accomplish this, The vector V is obtained by summing the columns of B, the 
absolute value of B,. Zero columns of B; give rise to zero components of . On line 
[6] K is defined to be the index of the first component of F that is nor negligible 
compared to S. 

If B, = 0, then K is | + pV’ and we are done. In this case the function jumps 
to line (10] and returns G,. If B, # 0, then B,[;K]] 1s the first nonzero column of By. 
On line [7] the largest component of B[; is located and the eycle is completed on 
line [8]. On line [9] the numbers / and p are updated and the function begins the 
next cycle. (Recall that —»H/ looks only at the first component of the vector W2) 

‘The functions are simple to use. 


a 


+G-GAUSS A 
130 000 0.39 
1.20 0.00 0 17 
0.50 1.00 0 50 


Gt KA 
1.060 0.060 1 0£0 
1.7E18 1.060 2400 
Q5E18 0 060 3.5618 
ECHELON A 
1 0£0 0060 1 00 
17618 1.060 2 060 
S5E18 0.060 3 5E18 


3.5 Automating Gaussian Reduction 179 


Writing down the solution of a system of linear equations, given the echelon 
form, is a special case of the process known as back-substitution, A back-substitu- 
tion function is defined in Chapter 5. 

If we wish to use the function ECHELON within another function, we need 
an automatic way of checking the acceptability of the result of ECHELON. A way 
of doing this is provided in exercise 13. 


EXERCISES 3.5 


(Computer assignment) In exercises 1 through 5 use ECHELON to solve the system of 
equations. 


IS s+ aye 3 


4 Bi Dey — My ty 3K 3 


Sx+ Oy+ 2= 8 =H tay t+ x 1 
9x + ly + Hz = 12 Sy My— Ay Ny ta 
3 Set y+ 132 425 =2 4, 29x + 330) + 6092 = 861 
ye r+ Sr=0 ~S83x — 206) — 489: = — 1004 
dy + 4+ Bal =23n + B6y + TSE 65 
64x — 92y + -35 
Wax — 78y + 432 66 
5. = 128Lx, + 1123x, ~ 59x, — 1496x, 37 
=Tix; + 50%, + 6lx,— 15x, 2 
43x, + 37x, — 2x, + 92x,= 99 
22x,+ Tax, — 2lxy— 15x, 47 
89x, — 60x, + MK + 734, 20 


{Computer assignment) In exercises 6 through 10 use Gauss to compute the inverse, if'one 
fof the specified type exists 


6 : 2 ‘ 7. 2 8. [) 23 
A=|45 6 45 45 6 
78 -9 78 Right inverse 
Inverse Left inverse 
9, [ 434 -220 250 10. [“ —375 395 “| 
ee ee) 27 «93 84 
-2% $3 -30 2-99 29 76, 
12410910. Right inverse 
Left inverse 


11. (a) Let § be a scalar and Va vector. Show that the expression 


SA. <S4V 


is 1 if and only if the components of ¥ are negligible compared to 5. 
{b) Let § be a scalar and Aa matrix. Show that the expression 


HIS aStA 


180 Systems of Linear Equations 


counts the number of columns of A that are negligible compared to 5 and that 
+/(StA)A-=S 


does the same for the rows of A. 
(©) Show that the expression 


(SAV #S 
counts the number of rows of A that are mor negligible compared to S. 


12, (Computer assignment) From exercise 11(c), the function 


Vo Z-RANK ALS 

ty shyla 

[2] Ze4/(S#ECHELON A)V #5 
v 


estimates the rank of a matrix A. Compute ECHELON A and RANK A for A equal to 

(a) 3 ano (b) 1€9%3 3008 (0) 1620x3 318 
13. (a) Show that if Bis an echelon form and any columns of zeros are deleted from E, 

the resulting matrix is still in echelon form. 

(b) Show that if £ is an echelon form, so is 114 E 

(©) Show that the function 


¥ 2S ECHOHK E 
(1) En(SV 4S¥E)/E 


tai Ze 
13] (OV. =0e) 0 
(4) Ze(teE( 1.1 })ASA ase 14E( 1) 
15] 2-2AS ECHCHK 1 116 

v 


Felumsa | if £ is an echelon form after components negligible with respect to S have 
been set 10 zero and returns a 0 otherwise. 


14, (Computer assignment) The n-by-m Hilbert Matrix H is defined by Hl /] = 
Wi +) — 1) 
(a) Write a function H/LB such that H/LB n is the n-by-n Hilbert matrix. 
(b) Does the function ECHELON row-reduce HILB 20 to an acceptable echelon 
form? (Don't print it out: compute 1 ECHCHK E-ECHELON HILB 20, where 
ECHCHK is defined in exercise 13,) 
(©) Let EE-ECHELON ECHELON H/LB 20. Is EE an echelon form? [See part (b).] 
(d)_ The Hilbert matrix is nonsingular, and the truncated version stored in the ma- 


chine is probably nonsingular also. What is RANK EE, where EE is from part (c) and 
RANK is trom exercise 12? 


3.6 Powers of Matrices 181 


3.6* Powers of Matrices 


In Section 3.5 we indicated that the problem of analyzing the powers of a matrix 
was an important one. In this section we will give some reasons for its importance. 
We begin with a simple formula. 


Proposition 3.8 Leta, v, be vectors in R” and let A be an n-by-n matrix. Define 
a sequence of vectors v, by 


v, ifn=1 
at Ay,  ifn>) 
Then 

Oy =U A$ FAM aH AMY, n> 1 
If J — A is invertible, then 

ty = (= AME = AMY AMV, > I 
Proof First we show that 

ty =U Ab FADO + AD, 
The proof, of course, is by induction, If n = 2, this is 
by = la + Av, =a + Ad, 

which is correct, So assume the formula is correct for v,_,. Then 


a+ Ad, 
=a4+ Al + A+ + Ama + AM 0,) 
SUF A$ FAM at AMO, 


and so the formula is also correet for 1. 
Now let S=/4+4+-.- + 4"*. Then 


AS =A + AP + + HAE AMI 
and 
AS = (1 — A)S = 1 — A"! 


Thus, if J — A is invertible, 


(Ayam) rs 


182 Systems of Linear Equations 


EXAMPLE 3.22 In Section 3.4 we computed the first twenty values of 


ifn=1 


MO) = Vi 420-1) ifn >t 


If we take A = 2, a= 1, and v, = 1 then 


CS a (ees ae 
Smt 42 


el(n) 


Proposition 3.8 applies to two wide classes of problems. 


LINEAR DIFFERENCE EQUATIONS 


A kth-order linear difference equation is a recursively defined sequence 


Ny = OyXyik + Nang Hoe Myr 


The starting values x,,xy.....x, must be given. 
For example, the Fibionacci numbers are defined by 


= Xp ttt 


Thus k =2 and a, =a, = 
‘To apply Proposition 3.8 we set 0, = (Xy...-.-%,), the starting values, and let 
Uy = Sgn Myctee ee Xyag le Then 


%y Ge Myon | Hr 
= = 0 df ye | = Ary 
0 |Ly, 


Xpokel 


nk 
Thus v, = A*v;, and the study of such difference equations is reduced to the study 
of A". To study 4" one pises the eigenvalues of 4, defined in Chapter 7. 

To compute the sequence x), %..--.%y.-+-. however, it is easiest to use the 
original formulation, which states that x, is 


(ay, 4, CM a eee See er: 


in) 


ExaMPLe 3.23 Compute the first twenty Fibionacci numbers. 


36° Powers of Matrices 183 


Solution 


Vv 2-FIBN 
Uy zeny 

[21 (N52) /0 

(3) ZFIB N41 

[4] 22,1 4 x-212 


y 
FIB 20 

11 2 3 5 8 13 21 34 55 89 144 233 377 610 987 
1597 2584 4181 6765 8 


STOCHASTIC MATRICES. 


A matrix is called stochastic if its entries are nonnegative and the sum of entries in 
each row is 1. Such matrices arise often in applications. The entry Afi; /] usually 
gives the probability that if the system being studied is in state /, it will move to 
state j. The number Ali; /] is called a transition probability. 

For example, suppose that a large number of fleas are hopping about on a 
designer sheet done in, say, three colors. Suppose that Afi; jis the probability that 
a flea sitting on color / will have moved to color j at the end of 1 second, Now 
assume that there are n, fleas on color I, ny fleas on color 2, and 1s fleas on color 3, 
One second later the distribution of fleas will be 


ny Alls 1] + myA[2: 1] + ngAI3s 1] fleas on color 1 
nyAll; 2) + npAl2: 2) + ngA[3; 2] fleas on color 2 
nyA[l; 3] + meAl2; 3] + nyAl3:3] fleas on color 3 


If we let the vector » = (4, mys) denote the situation in which there are », 
fleas on color i, then after 1 second the distribution of fleas is 0 +.% A 

We can put this in the form of Proposition 3.8 by taking transposes, but it is 
hardly necessary, The distribution of fleas after m seconds is v +. 4", and we 
analyze the system by analyzing the powers of the matrix 4. 

The matrix is called a Markov chain matrix. It can be shown that the rows 
of 4” often all approach the same vector as n —> 00. This vector usually represents 
the steady-state behavior of the system being analyzed. 


Exampre 3.24 Let 


, 
ies 
$4 


Does 4* seem to approach a limit asm —+ 20? 


184 Systems of Linear Equations 


Solution Using the function TOTHE defined in Section 3.4, we have 


ry 
0.5000 0.0000 0.5000 
9.9933 0.3333 0.3333 
© 2000 0.2000 0 6000 


+A20-A TOTHE 20 
0.5158 0.1579 0 5263 
0.3158 0.1879 0.5263 
0.3158 0.1579 0.5269 


So the rows of A* are all the same to four digits. Let us see if A" is fuzzily 
equal to A®? 


A20=A20+.xA20 


° ° 
oo 0 
oo 0 


Thus there are differences between 4” and A”. We can try a somewhat 
higher power: 
+A400-A20 TOTHE 20 
0.9158 0.1879 0 S269 
0.3158 0 1579 0 5269 
0.3158 0.1579 0.5260 


A400-A400+ «A400 


So At is fuzzily equal to 4%" (ICT is here about 36-15). 


EXERCISES 3.6" 


For exercises | through 4 write the difference equation in matrix form by finding the v4, a, 
and 4 of Proposition 3.8, 


nes for all n. 


Hint: What is 4%? 


47 Nonlinear Equations 185 


6. Use Proposition 3.8 to compute x, for exercise 3 above. 
7. Use Proposition 3.8 t compute x, for exercise 2 above. 


8. (Computer assignment) For the matrices 4 below does A" seem to converge to a 
matrix with constant rows asm —+ 20? (Notice that the rows sum to 1.) 


Lol 03 94 1) #2) ies 
@ 4=|3 7 0 eI oe ee 
45 35 20 78 9 -23 

10 11 12) =32 


9. (Computer assignment) In the discussion following Example 3.23 assume that, of the 
fleas on color 1, 5 percent move to color 2 and S percent move to color 3. OF the fleas on 
color 2, 5 percent move to color I and 20 percent move to color 3, OF the fleas on color 3, 
10 percent move to color | and 20 percent move to color 2, What is the long-term distribu- 
tion of fleas? 


3.7* Nonlinear Equations 


Nonlinear equations are much more difficult than linear equations, This section 
presents two procedures for estimating roots of nonlinear equations, The first 
method, sectioning, is quite satisfactory for finding roots of f(x) =0 when 
f/R — Rand is easily computed. The second, Newton’s method, applies to func- 
tions f: R® —> R". Newton's method was discussed in Section 2.6%, and the discus- 
sion there assumes familiarity with partial differentiation. In this section we define 
a function that makes Newton’s method easier to use, A fully automatic Newton's 
method is not attempted. 


SECTIONING 


Let f: R > R and suppose we wish a root of f(x). Assume that we can find num- 
bers a and 6, with a-< >, such that /(a) and /(b) have different signs. If the 
function fis continuous, then by the intermediate-value theorem of elementary 
calculus there is a number £, a <¢ <b such that /(¢) = 

To estimate £ we chop the interval [a, 6) into subintervals — 100 or 1000 
subintervals, sayf — and find the first subinterval on which f changes sign, call it 
[a,, by] (see Figure 3.1). The process can be repeated with [a,, b,] replacing {a, b] 
until € is located as accurately as desired. To subdivide (a, 6] into subintervals we 
can use the function CHOP from Example 3.4. Suppose that /: R  R has been 
defined as an APL function called FCN that works componentwise on vectors, 
Then 


Yo-FCN X--100 CHOP a,b 


+ Depending upon the storage available in your workspace 


186 Systems of Linear Equations 


fe) >0 


O>fiay 
FIGURE 3.1 


defines to be the vector of endpoints of the subintervals and ¥ to be the vector 
of values of fat these points. To locate the subinterval [a,, b,] we must find the first 
sign change in the vector ¥. 

Now the monadic function x gives the sign of a number XA is —1 if A is 
negative, | if A is positive, and 0 if A is zero, If & is such that 
(x(k — 1) # XYIAL is the first sign change, then k is (x ¥) # x YUNclF 
Thus @,, b, is just 


XE-1 09 CAV eAYEN TINT 


Suppose we decide to accept as a root any x such that | f(x) < E for a given 
small number E. Then the roots s, if any, among the components of X are given by 
the compression 


2 (E>|¥) ix 


If there are no “roots,” then Z is .0. 
The function CENTSECT takes E and J = (a, b) as arguments and returns at 
least one “root,” provided f(a) and /(b) have different signs. 


V2 CENTSECT 1 :xi¥ 
[1] Li¥eFON x2100 CHOP 
(2) exE-4 OF (CaM) exYEN DI LT 
(9) C0apZ- (EWI ¥) 1x) EL 

v 


EXAMPLE 3.25 Find the roots of f(x) = x? — sin x. 


Solution fix) 
y =sin x (Figure 
between 0 and 7. 


0 if and only if x* = sin x. Sketching the curves y= x? and 
2) indicates that in addition to the root x = 0, there is a root 


+The dyadic function + is discussed in Section 3. 


3.7 Nonlinear Equations 187 


FIGURE 3.2 


First define the function /: 


¥ Z-FON x 
11) 2-(X+2)-10x 
° 


Next we need a starting interval (a, b]. For b we can take . We need a > 0, 
otherwise we may just get x = 0, which we already know. Try the interval {.1, 7}. 


FCN 1,01 
0 08983 9 a7 


The signs of f(a) and f(b) are different, and so [.1,7) will do, Taking 
E = 10°, we have 


*ANS--1E~10 CENTSECT 1.01 
0.8767 


FCN ANS 
3.492611 


Notice that if we try E = 10-1 we get 


TANSA1E-15 CENTSECT 1.01 
0.8767. 0.8767 0 8767 0.8767 0 8767 0 8767 


This indicates six valid cases. The range of x values is 


ANS(1 61 
1.521615 


188 Systems of Linear Equations 
so they differ only in the fifteenth decimal place. All meet our criterion for a root: 
Lf] < 10-8, 


FCN ANS 
7.936E-16 -4,5626-16 1 162E-16 2 .212E-16 5 612E-16 
9 003E-16 


Many prefera function that stops when the interval is smaller than a specified 
E. In this ease one tests the length of + in line [3] of cenrsecr above and returns to 
Cif = |-/1 (see exercise 1). 


10°, 


EXAMPLE 3.26 Find an approximate root of | +x + x ~ Sx with E 
Solution A definition of FCN in this case is 

¥ Z0FON x 
(1) 2k 1Oaye et 110-5 


v 


To find an initial interval we first look at /(k), k = —10, 10: 


FCN 20 CHOP 10 10 

49910 32730 20420 11960 6449 3104 1267 398 77 
491-2 “73 392 1259 3094 6437-11950 -20410 
32710 49890 


From the sign changes we see that there is a root in [—1, 0] and another in 
{0.1}. In {[=1,0] we have 


+ANS-1E-10 GENTSECT 1 0 
0.6256 


FCN ANS 
1 7S8E 1 


and in (0, 1] we have 


‘ANS-ANS,1E-10 CENTSECT 0 + 
0.6256 0 846 0 846 0 846 0.846 0.846 0.846 0 846 0 846 
0.846 0.846 0.846 0.846 0.846 0.545 0.846 0 845 0 846 
0.846 0 846 0 846 0 846 
FCN ANS 
1758-11 9.528E-11 B.S86E 11 7.644E-11 6 702E°11 5 76E 11 
4.818E-11 3.876E-11 2.934E-11 1.992611 1 O5E-14 
1.076E 12 -8.944E-12 -1.776E-11 -2.718E-11 -3. 66E-11 
4.602611 5 5446-11 ~6.486E-11 -7.428€°31 8.37811 


9 9126-11 . 


3.7 Nonlinear Equations 189 


NEWTON'S METHOD. 


Let f: R" — R" be of class C’. In Section 2.6* the Newton's method iteration for a 
root of f was defined as 


in = Kya — SO -1) 8 DP) 
where x,_, is an approximation to a root and x, is the next approximation, Here 
Dfis the derivative (= Jacobian matrix) of /- 

A convenient way to use this formula is to write a function that will display 
the result of NV iterations as the rows of a matrix. Let the vector xq be the initial 
guess at a root. An APL version of the formula above is 


v ZN NEWTON XO 

[11 261.9. x09 0x0 

(21 (N20) /0 

(3) Z(We1) NEWTON xo 

(4) 2-2 (U1ZINE P= (FON ZIN:1)BOFCN Z{N; 1 


and 


The functions f and Df must be defined as the APL functions FC 
DFCN. 
‘The next example was presented earlier as Example 2.53 


EXAMPLE 3.27. Estimate the first-quadrant solution of the simultaneous equations 


1 (an ellipse) 


1 (a hyperbola) 


Solution The solution is, from Example 2.53, (2, V3). 
To define the function f: R® > R? we first multiply the first equation by 16. 
Then /: R? — R® is given by 


x + 4p? — 16 
fi) —|hae ak 
x= yt- 1 
or 
vo ZAFON x 
U1) ZB Bel ANG oT tye CRT)? 
¢ 
and 
: 2x ty 
lay — 
or 
¥ Z-DFON x 


[1] 2-2 292 @ 2 -24x.x 
v 


190 Systems of Linear Equations 


We start with an initial estimate of x) = (1, 1) and compute six iterations. 


+ANS-8 NEWTON 1 1 


1.0000 1 9000 
2.5000 2 0000 
2.0500 1.7500 
2.0010 1.7320 
2.0000 1 7320 
2.0000 1.7320 
2.0000 1.7320 
FON ANS{ 6.) 


8 674E 15 8 639E-15 


‘Thus after six iterations we are quite close. Since in this ease the answer is 
known to be (2, V3), we can find the actual error. 


ANS~ (WANS) 92,342 


1 00060 7.92161 
5 000-1 2 679E-1 
5 000-2 1. 795E-2 
6 098 4 9. 2086 5 
9.20268 2. 445e 18 
2.161615 5 2046-18 
34696 18 3 4696 18 


Since the APL processor used for this example carries only eighteen digits, we 
cannot expect to do better. 


Newton's method usually converges quadratically, once a sufficiently close 
estimate 1s obtained, Quadratic convergence means that if the order of magnitude 
of the error at step is E, then the order of magnitude of the error at step n + Lis 
CE? for some constant €. 


EXERCISES 3.7* 
1. Write a version of CENTSECT that stops when a root of f(x) is trapped in an interval 


of length tess than E. 


In exercises 2 through $ find a root of the given function. 


2 fixy= leu get 3. fix) = dx — Tan 
4 fla) = 1 + x88 — 989 (> 0) S. fix) = x + cos.x — sinh x 


In exercises 6 through 9 you may estimate starting values by looking up the general shape 
of the named curves in a mathematical handbook. 


6, Use Newton's method to find where the bifolium (x + »2)? = 3xy cuts the astroid 
x14 yO = 


38 Natural Cubic Splines 191 


7. Use Newton's method to find where the Cassinian oval (x? + 
(x <0) cuts the strophoid (1 — x) = 41 + x) 


8. Find where the folium of Descartes x? + y" — 3xy =0 cuts the witch of Agnesi 
Sy + 4y a8: 


(a) By solving the equation of the witch for y, substituting into the equation of the 
folium and using CENTSECT: 


(b) By Newton's method. 
9. Find the four points common to the ellipsoid 


+17 = 


ts 


the hyperboloid of two sheets desenibed by 2? — x — y# = 1, and the elite eyinder 
Ha — 1 + y? 


10. Find the maxima and minima, if any, of the function 


Sx, 9) = 4 + Dvty! gt yh = det + 6 


Hint: Set the derivative equal to zero. 


3.8" Natural Cubic Splines 
(A Symmetric Tridiagonal System) 


Itwould seem that Section 3.5 finishes off the problem of solving linear equations. 
All one has to do is drop the augmented matrix into ECHELON and read off the 
answer. This is far from true, Often in practice one encounters situations in which 
GAUSS simply will not do. These situations call for special methods, adapted to 
the individual problem. This section is devoted to one example that often arises; 
the natural cubie spline. 

Cubic splines are used for curve fitting when there is little or no error in the 
data} and no reason to choose a curve of any particular form to fit the data. A 
cubic spline goes through the given points with as little “wiggling” as possible, 
subject to the constraint of having a continuous second derivative, (The term 
“spline” originates in a drafting instrument used to draw a smooth curve through 
a set of points plotted on a drafting board.) 

A typical modern application is in the field of computer graphics, Some 
points are specified on a CRT screen, perhaps by touching a light pen to the 
sereen, and the computer then draws a smooth curve through the points. 

Splines are also used in cinema photographic analysis of the motions of com- 
plex systems. A complex system such as a tennis player’s upper body or a karate 
expert’s hand and forearm is carefully photographed. The changing positions of 
reference points are then recorded frame by frame, and cubic splines are fitted to 


#There is, however, a “least-squares cubic spline” used in the presei 
more sophisticated type of spline will not be discussed here 


192 Systems of Linear Equations 


the data. Since splines have continuous second derivatives, the resulting expres- 
sions can be differentiated to find the forces and accelerations on the reference 
points, 

Splines are piecewise functions. The x axis is divided into many subintervals, 
and a different formula is used over each one. In the most common situation a 
different cubic polynomial is used between each successive pair of data points. 

Suppose, for example, that the data points are (x,-¥):(%x Ye) -+ 
with a = xy < xy < +++ <x, =. On each interval [x,,.x,,,] We Want a cubic 
polynomial 


SW) = Cy + Cas + Cx? + Ge FH 12....0 1 


The polynomials must satisfy the conditions (Figure 3.3): 


0) So) T=, eal 
Q) Sa) = 121,2,..50=1 
(3) Sia f=1,2....m=2 


(4) 6=1,2,...,.n=2 


From (1) and (2) we have $)(x,,1) = S\,(%,,,), and so the resulting 
piecewise function has continuous first and second derivatives 

There are n — 1 functions $\(x) and hence 4(n — 1) = 4n — 4 unknown con- 
stants C,. The conditions (1), (2). (3), and (4) provide 2(n — 1) + 2(n = 2) = 
4n — 6 linear equations among the C,. Thus at least two more equations are 
needed for a unique solution. One has a natural cubic spline if the two extra 
equations are taken to be 


(5) Siix) =0 
(6) (x,) =0 


Itean be shown that the 4n — 4 equations provided by conditions (1) through 


qs I) 


a 


FIGURE 3.3 


38 Natural Cubic Splines 193 


(6) uniquely determine the unknown coefficients C,,. In a typical application, 
however, m may be more than 100, and so the matrix is larger than 400 by 400. 
Many APL systems simply cannot work with such large matrices, and ECHELON 
is useless for the problem. 

The matrix of the system is large but sparse. It is mostly zeros. Even though it 
is larger than 400 by 400, each row contains at most six nonzero entries. To solve 
such systems one devises schemes, adapted to the particular problem, which solve 
the systems without storing the zeros. 

Most schemes for cubic splines first reduce the problem from that of a sparse 
4n — 4 by 4n — 4 system to a sparse n — 1 by n — 1 system in which the un- 
knowns are the numbers ${(x,) or 5/(x,). We shall use ann — I by n — | system 
with unknowns 


= SH) FEN end 


Since we want a natural cubic spline, we have s, = 0 and s, = 0. 
Since S, is a cubic, $/(x) is the straight line from (x, 5;) 0 (X,.4. Sj.) and 
hence we can write it in the form 


Sy) = alate =x) =500— 4h $= Lon 


Here Ax, 1—%) Notice that SY(x) is linear in x, 87x) 
$/,.4) = Sio1. and $0 the formula is a correct one 

If we know the s,'s, itis not difficult to get an expression for S(x). Integrating 
twice and choosing the integration constants cleverly gives 


5.308 — x) = 508 = 4.499] + oe = x) + de = x.) G2) 


snl 


Using the conditions (1) (S\(x,) = 94] and (2) [\(x,,) = 1,4) in this expression 


gives 


2 Axpe 
ee ea 4, =(» — Sars) =a 
6 6 


Suppose we let X, ¥, 5 be vectors with components (x,), (.),), and (s,), Fespec- 
tively. Further, define an APL function A by 


vo Zax ok 

[1] Kerottex 

(21 2 xtKe1-x0K) 
v 


194 Systems of Linear Equations 


Then the constants ¢, and d, are given by 


CH(C1EY)-3 X)-CLS) xd 8 
De ((-1EY) A K)(-1LS) ed XB 


Now let us return to the problem of computing the m ~2 numbers s,. 
= 2,3,..-.n — 1. We have used conditions (1) and (2) to compute the ¢, and d, 
Condition (4) was used to write 5/(x) in terms of 5, and s,. We must use con 
tion (3) to determine the 5,. 

Differentiating $\(x) and using the expressions for ¢, and d,, 


Ey 
— Moa 


ax, 6 


Si) esl — XY)? — (= sav] + 


1 

Tax, 

Condition (3) states that 8)(x,,,) = Sf.iC%y,yf = 1,2.---.m — 2. Hence we 
have the following equations for the unknown quantities 5, 


Ax, + 25 + Sve abier + Mei =b, 
where Ay, = yi; — J Since ¥, = 0, the augmented matrix of this system is 
Ax, + Axy) Ax, ° SAE 
Ax, AAxy + Ary) Ay ve | by 
0 xy YAxy + Any -..| by 
0 ) Axy 


The coefficient matrix is symmetric and sridiagonal; that is, only the main diago- 
nal and the minor diagonals direetly above and below are nonzero, 

This coefficient matrix, you will notice, is sparse, OF the (n — 2)? entries 
only (1 ~ 1) + 2m — 2) = 3n ~ $ are nonzero and, since it is symmetric, onl 
(n= 1) + (n = 2) = 2n — 3 numbers need be stored to keep track of the coeffi 
cient matrix. 

The next task is to write a special APL function to solve this special system. 

The system is of the form AN = B, where 


0 by 
0 by 
“ Fe ex= alr 

Fant Sym 

oe b, 


Suppose (hat We clean up the first column of the augmented matrix by pivoting on 


38 Natural Cubic Splines 195 


a,, fin our case a,, = (Ax, + Axy) # 0}: 


ie erag/onee wee ne arc 0l Dyay: 


so we can pivot next on the 2:2 entry. It can be shown for the cubie spline matrix? 
that row interchanges will never be necessary. This means that the matrix is nicely 
set up for an inductive solution. If'we assume that we already know the solution of 
such a system when it is m — 1 by  — 1, then we already have the values of the 
variables Xp,Xy.-.-.%,, and the value of x, is just 


X= (b, — @y9Xy) + yy 

To get the induction started, we notice that the solution of a I-by-1 system is 
just x, = by + ay, 

In the APL function TR/D/ the left argument is a matrix with three columns, 
The first column is the diagonal ayy, dy. 49, -+-+4yq The second column is 
+4, 19:0 (the zero is just padding), and the third column is 


© Z-TRIOI A 
(4) Zeeratrsa 4) 
12) sQ1=1tpayyo 


13) AL2e1 BT-AL29 BI-AL1G2 By ALT 2IOALIID 
[4] ZeTAIDI 1 OWA 
[5] Z-((A(1;3 2)-AL V1 I)9 m1 112).2 


On line [1] Z is set equal (0 6, + a,;. On line [2] we stop if A has only one row. 
‘Otherwise line [3] replaces ayy by dyy — ajy/a,, and replaces by by by — dyyb,/a,. 
‘On line [4] we pick up the solution of the (n — 1)-by-(n — 1) system and on line 
[5] set x, = (hy — ay 


See. for example. LF. Shampine and R. C. Allen, Numerical Computing’ An Introduction (Philadel 
phia: W, B. Saunders, 1973), p. 57. 


196 Systems of Linear Equations 


If the available storage is small, we can always rewrite a recursive function 
using loops (exercise 5). 

The data we are given to begin with consist of a vector X of the numbers 
Xy, Xp, ++ 5%, (in ascending order) and the corresponding vector of ¥ coordinates 
Yin «+43, From these two vectors we must first build the matrix A to be used 
‘as the right argument of TRIDI. A has n — 2 rows and 3 columns. Define H by 


Hea x 


‘Then His the vector with components H1[i] = Ax, and length n — 1, We have the 
identities 


AL t)=2<¢-1 tH) 91 LH 
ALi2}=1tH 


‘This last expression makes An — 2;2] = Ax,_, rather than 0, but this component 
is just padding — itis not used in any of the computations. Finally, observe that 
the expression 


translates into the identity 


AL .3]n6xA(S Y)-ax 


The function 


Vo AcX SETUP Y GH 
(1) An(2e(-14y ett) [1 Sy1tHeS x 
[2] AeA, 6xA(A YD oH 

v 


will be useful for setting up the matrix A. (The expression, 11.5" is the /amina- 
tion function, It catenates two vectors into the columns of a matrix.) 


Finally, we have the question of what form the storage of the cubic polynomi- 
als should take. 


In Equation ( 


2) above we wrote the /th polynomial as 


Sx) = a(x — x4 B(x — 4,4)" + ox — x) + dx — 
where 


5 
irs 


Your Sas BX 
6 Ax," 


iy Sy 6 


Ifwe store a,, b,.¢,,d, as the ith row of a matrix CF and if Tis any vector of points 


38 Natural Cubic Splines 197 


in the interval [x,,x,,,} then S, can be evaluated on all the components of T 
simultaneously by the expression 


(73). THT" -X(1401})+ xOF [Is] 


For this reason we store the coefficients in the (n — 1)-by-4 matrix CF with 
CFI] = (@,, b,.6,,d,). The function SPLINE below takes the data vectors V and 
¥ as input and returns CF. 


¥ CF-X SPLINE Y «SH 

11] S-0,(7RIDI x SETUP Y).0 

[2] CF((148).[1.8]-"118)-6xHeH, 11 SIMA x 

13] CFRCR.(C(14¥), (1. 5]=-14¥) oH) (H92) OF 
v 


The function we use to compute points in the interval [x,..¥,,4) Is 


¥ 2-0 EVALAT T 
WY) Ze0( 748) Tete CVT) TVY ae 
v 


The left argument C is CF|s:], the first component of the vector Tis x,, and 
the last component 1s x), 


EXAMPLE 3.28 The three points (0,0), (1, 1), (2.8) lie on the eubie /(x) = x4, but 
4x) is not the natural cubic spline through the three points, Although / satisfies 
‘conditions (1) through (5) [with s,(x) = s(x) = flay]. we have /“(2) = 12 4 0, so 
condition (6) is violated. 

The natural cubic spline is given by 


+CF-0 1 2 SPLINE D1 8 
1.500 0.000 0 500 0.000 
0.000 1.500 8 900 0 500 


That i 


xt — 4x 0 
ix — 2) +8 — 4H 2), 1x <2 


iA 
iA 


Six) 


and it is not difficult to see that S(x) satisfies conditions (1) through (6), 
To sketch S(x) we compute some intermediate values at intervals of AX 


CFI.) EVALAT 10 CHOP 0 4 
0.0485 0.088 0.11 -0.104 -0.0625 0.024 0.164 0 368 
o643 1 


198 Systems of Linear Equations 


FIGURE 3.4 


CF[2:] EVALAT 10 CHOP 1 2 
1 1.46 1.07 256 322 394 47 5.48 6.91 7.15 8 


The graphs of f(x) and S(x) on [0,2] are sketched in Figure 3.4. 


In many applications, such as computer graphies, one wants more general 

curves than the graphs of functions. In particular, one often wants closed curves 

(such as circles and ellipses). In such cases one uses a parametric spline: 
Pre [) i, [* +b tet ar 
me LACOL Le, thet git + et 


For a discussion of such splines, complete with APL functions for computing 
them, see R. G, Selfridge, “Splines and Graphs.” APL Quote-Quad, 8:4 (June 
1978), 29-33. 


EXERCISES 3.8* 
Lov 


¥y that the function S(x) of Example 3.28 satisfies conditions (1) through (4). 


(Computer assignment) In exercises 2, 3, and 4 fit a natural cubic spline to the data and 
sketch the resulting graph. 


2. The data of Examples 2,28 and 2.30, 


4.8 Natural Cubic Splines 199 


3. The data of Exercise 12 of Section 2.3. 
4. The data of Exercise 14 of Section 2.3. 


S$. Write a version of TRID/ that is not recursive. [Hint You will need two loops. The 
first should in effect reduce a symmetric tridiagonal matrix plus an augmentation column, 
to the form 


1p, 0 olny 
OT pO... 0] ry 
i], 


The second loop should then solve the latter system by starting with x, = x, and working 
back up.) 


6. Show that if the numbers Ax, are all equal, then the tridiagonal coefficient matrix may 
bbe represented by the matrix 


(NN) 04 1. (NE2) 00) .9 


CHAPTER FOUR 


Geometry and 
Coordinate Systems 


For vectors with two and three components the algebraic manipulations of the 
previous chapter may be given geometric interpretations. This is done in the first 
three sections of this chapter. The connection between the geometry, which is 
restricted to two and three dimensions, and the algebra, which has no such restric- 
tion, is then used to extend the geometric concepts to higher dimensions. The 
geometry involved is technically known as affine geometry. Euclidean geometry, 
Which involves the notions of distance, angle, and congruence, will be taken up in 
Chapter 

In Section 4,1 vector addition and scalar-vector multiplication are interpreted 
geometrically for vectors in R? and R®, Geometric questions of the type, “What 
is the intersection of the two planes?” are answered by using the machinery of 
Chapter 3 to row-reduce a system of linear equations. 

Affine coordinate systems are introduced in Section 4.2, and coordinate- 
change formulas are derived. These coordinate-change formulas are used in Sec- 
tion 4.3 to give a detailed analysis of quadratic functions of two variables, This 
material will be used in Chapter 5 to analyze quadratic functions of n variables 
The analysis of the general quadratic function (a process generally known as 
“diagonalizing a symmetric matrix”) is the basis for the second-derivative test for 
maxima and minima and for the statistical techniques of principal-component 
analysis and factor analysis. 

In Section 4.4 the geometric notions of the first two sections are formally 
extended to higher dimensions, and the set of solutions of a system of linear 
equations in n variables is shown to form a “flat” in R®. The coordinate-change 
formulas developed in Section 4.2 are also extended to R” 

In Section 4.5 the notions of translations and parallelism in R® give rise to 
the definition of a “subspace” of R®. Special subspaces associated to a matrix are 
defined and used to prove that a matrix and its transpose have the same rank. 
Many of the results in succeeding chapters will be phrased in terms of subspaces. 


200 


4.1 Geometric Vectors, Lines, and Planes 201 


FIGURE 4.1 


4.1 Geometric Vectors, Lines, and Planes 


Points in the plane of Euclidean geometry may be identified with vectors in R®, 
To do this, one chooses an orthogonal pair of axes and assigns to each point p a 
pair of coordinates (a, 4). The point p may then be identified with the vector (a,b) 
in Re, 

‘This identification is far from an empty formalism. Historically, vector alge- 
bra has its origins in the Euclidean geometry of the plane and space.} To form the 
connection with geometry, we identify the vector (a,b) with the directed line 
segment or “arrow” from the origin of the coordinate system to the point p (Fig 
ture 4.1), Once this identification has been made, wo of our basie operations with 
vectors can be given purely geometric interpretations. They are addition of vee= 
tors and the multiplication of a vector by a scalar. 


VECTOR ADDITION 


Suppose that p has coordinates (a, b) and q has coordinates (c, d). The sum of the 
vectors is (a, b) + (c,d) = (a + ¢,b + d). Form the parallelogram determined by 
the points p, g. and the origin, The shaded triangles in Figure 4.2 are then con- 
gruent. From this it can be deduced that the fourth vertex of the parallelogram 
has coordinates (a + ¢, b + d). Thus the geometric “sum” of the two vectors cor- 
responds to the fourth vertex of the parallelogram. 

There are two useful ways of looking at this geometric version of vector addi- 
tion. These are pictured in Figure 4.3, where we use the symbol p to denote any of 
the three entities: the point p, the coordinate pair (a, 6) = p, and the directed line 
segment from the origin to the point p. In Figure 4.3(a) the sum of the line 
segments p and q is the line segment that makes up the diagonal of the parallelo- 
gram. 


First. Gauss interpreted complex numbers as “vectors” in the plane. Nest, Hamilton extended the 
ideas to three-dimensional space with bis system of quaternions. The algebra of quatermions then 
evolved into the vector analysis of physics and engineering 


202 Geometry and Coordinate Systems 


FIGURE 4.2 


pra 


7 


aw (by 
FIGURE 43 


In Figure 4.3(b) the parallelogram has been abbreviated to a triangle, To 
form p + q the vector q has been picked up and, without changing its length or 
direction, moved so that its tail coincides with the head of p. The vector p + q is 
then the line segment from the tail of p to the head of 

Figure 4,3(b) comes from a different definition of geometric vector than the 
one we are using. In this alternate development a vector is something with 
“length” and “direction.” Two vectors in different regions of space are then con- 
sidered “equal” if they have the same length and direction. In this development a 
vector has no fixed position but may be drawn anywhere. 

Formally we will consider our vectors to have their tails firmly nailed to the 
origin.} We feel free to move them about for heuristic purposes, however. 


SCALAR MULTIPLICATION 


Let the point p have coordinates (a, b) and let \ be a scalar. The product of the 
scalar \ and the vector (a, b) is the Vector (Aa, \b). The corresponding points are 
plotted in Figure 4.4. The length of (Aa, Nd) is 


Vita? + (NB? = |A| Va? + BF 


We are, of course, free w choose the position of the origin. 


4.1 Geometric Vectors, Lines, and Planes 203 


AMID O 


a, <0. 


FIGURE 44 


and the slopes of the line segments are 


b db 
oat A #0) 


Thus scalar multiplication for \ > 0 simply stretches or shrinks the length of the 
segment by the factor A without changing the slope. If A <0, there is reversal of 
direction as well; that is, the line segment is first stretched or shrunk by the factor 
JA| and then, if \ <0, itis reflected through the origin. 

The situation is precisely the same in three-dimensional Euclidean space 
(Figure 4.5). Choose three mutually perpendicular axes meeting in a single point 
as a coordinate system. Then each point p may be identified with a vector of 
coordinates (a, b,c) in R® and with the directed line segment from the origin of 
the coordinate system to p. With these identifications, the vector operations in R® 


(a,b,c) + (a', bc) = (a +a’ b + be +0") 
Ma, b,€) = (ha, Xb, Ne) 


p= (a, be) 


FIGURE 4.5 


204 Geometry and Coordinate Systems 


p+2a—p) 


(a) (>) fc) () 
FIGURE 46 


have exactly the same geometric meaning as in the plane. In what follows, we will 
assume for the most part that the Vectors lie in three-dimensional space. This 
makes very little difference to the pictures we draw, since they are really a kind of 
schematic drawing. Since addition and scalar multiplication have a geometric 
meaning, we may dispense with the coordinate axes and still use algebraic manip- 
ulations, This is the appeal of the vector approwch (o elementary geometry. 

We will constantly form the vector q — p from the vectors p and g. and so let 
us look at what it is geometrically, Now, p + (q — p) = (to see this, think of 
them as pairs on RY or triples on R®). So, if we take the vector g — p and move it 
so that its tail coincides with the head of p, then the head of q ~ p falls on the 
head of q (Figure 4,6(a)}, and so we may think of q — p as the vector from p to 
g. Sines Hq — p) is half as long as q — p, the vector p + 4q — p) is halfway along 
the segment from p to q [Figure 4.6(b)} The vector p + 2(q — p). on the other 
hand, lies on the extension of this segment beyond g, and p + (—2)(g — p) lies on 
the extension of this segment beyond p [Figures 4.6(c), (4)} 


PRoposttion 4.1 Let p and q be distinet points. As the parameter ¢ varies, the tip 
of the vector 


1) =p +g —p)=( —op +q 


sweeps along the line through points p and g. In particular, 


for O<1<1 Ki) lies between p and g 
for t>1 gies between p and 2) 
for 1<0 plies between g and Kr) w 


The proposition is illustrated in Figure 4.7. The vector Kt) is a parametric 
representation of the line through points p and q. Notice that if p = q, then p —q 
is the zero vector and Kr) =p. 


Exampre 4.1 


(a) Find a parametric representation of the line in the plane through points 
(1, 1) and (—3, 0). 


(b) Find a parametric representation of the line in space through the points 
(1,1) and (—2, 1, —2). 


41 Geometric Vectors, Lines, and Planes 208 


0 


FIGURE 4.7 


Solution 
(a) Writing vectors as column vectors, take p = (1,1), q = (—3,0) 


i =p+nq—m=[1]++((-3]-[)) 


sal bbe 


Notice that since 


toe ESTE} = @ [lt] 0 


are points on this line, another parametric representation is 


m=] e=]-B)=b)+1S 


1 2 1 
(b) MH =] +e Wy-Jii}= 
1 -2 I 


As in part (a), many other parametric representations are possible, 


1 =3 
W+r) 0 
1 -3 


In the plane we commonly write a line as a Cartesian equation y = ay +b 
(or x =c for vertical lines). We easily obtain this Cartesian equation from & 
parametric representation by “eliminating the parameter.” 

For the line of Example 4.1 we have 


[l-w=B EAI o4] 


4.1 Geometric Vectors, Lines, and Planes 207 


1 


206 Geometry and Coordinate Systems 


1 — 41, y = 1 — 1. Solving for ¢ in terms of x gives 


or x 


Ns 
ecbr 


No single equation in x, y, 2 will give a line in space, however, and the parametric 
expression is usually to be preferred. 
Note: Any expression of the form 


The echelon form of the augmented matrix is 


Kiy=p+m, -w<tca 
ECHELON 339 1 1 5 92 “1 “7 102 
where p and v £0 are vectors, sweeps out a line as ¢ varies. It is the line 1,000 0.000 2.000 
through p and g =p +. 0.000 1.000 3.000 


0.000 0.000 0.000 


INTERSECTION OF LINES or f, = 2, fy = 3. The point of intersection is 
Let 4) =p + wand m(t) = q + nw be parametric representations of two lines. If 1 1 4 6 = 
the lines intersect in a point 7, then there are parameter values /,. ty such that nil eal eve ell ea elle es |e 
Ki) = mI). 1 1 3 3 0 
Hence a 
pthv=q+ttw (b) Parametric representations of the lines are 
or 1 2) it 
or May) =} 1) + 04]2] —}1 
1 o} tt 
3 6) [3 
EXAMPLE 4.2 Do the following pairs of lines intersect? mo =}0o} + ¢{]-2| -|o 
1 1 6 -1 2 3] 2 
(w WO =]1] + e]-2]. mn =]—0] 4c] 1 
1 1 3 0 If m(z.) = Ka,), then 
(b) The line through (1, 1, 1) and (2,2,0) and the line through (3.0,2) and 2 re) 
(6,-2.3) 1J4aq] 1} =lo 
1 -1} 12 


© wel leLbem-L eC d 


Solutions 
(a) If Aa,) = m(ty), then 
1 
Han |- 
1 


6 -1 
=|-6)/+2] 1 Re 
3 


208 Geomerry and Coordinate Systems 
which has echelon form 1D 3. Thus there is no solution to (r,) = m(¢,), and the 


lines do not intersect. 
(o) TE Key) = nifty), then 


Ciel 3}-Lalele] 
(3 ll]-Lel-()-Ls 


The augmented matrix of the system is 


[3 <| 


or 


which has echelon form 
[ ole} 
0010 
This means that r, can be chosen arbitrarily and then 1, =3 — 2s. Thus 


im(i,) = (3 ~ 244), and m and [are two different parametric representations of 
the same straight line. 


PARALLEL LINES 


‘Two lines, / and m, are parallel if we can slide one, without rotating or twisting, 
onto the other (Figure 4.8). The idea of sliding without rotating or twisting is 
formalized in the following definition. 


FIGURE 4.8 


4.1 Geometric Vectors, Lines, and Planes 209 


Definition 4.1 Let a be a vector in R®. The affine function 7:R" — R® given 
by T(X) =a + X is called sranslation by a. 

Suppose that (x) = p + 1v, m(1) = q + tw are parametrizations of | and m 
and that / and m are proper lines and not points; that is, vu #0 and w #0. If | 
and m are parallel, then (Figure 4.8) there will be a vector a such that 


TUM) =a +p +0 


is another parametrization of m. This means that a proper choice of a will cause 
the system 


atp+iuaqtiw 


to have an infinite number of solutions (/,,/,), which in turn means that for 
properly chosen a, the echelon form of the matrix 


lo] —wig-p—al 


is if * Al in the plane 
ions 

or 0 0 0} — inspace 
000 


Thus the fines are parallel if and only if the second column of this echelon 
form is nor a pivot column, By Proposition 3.1 this means that —w, and hence w 
also, is a multiple of v. We will state this result in the form that generalizes to 
higher dimensions. 


Prorostrion 4.2 Let M(t) =p + tu, m(a) = q + bw, represent lines in the plane 
or space. The lines are parallel if and only if the vectors v and w are linearly 
dependent. = 


Notice that the proposition does not contain the hypothesis v # 0/and w #0. 
If either vector is zero, then the two vectors are linearly dependent and one or 
both of the lines degenerates to a point. We will consider a point to be “parallel” 
to anything (point, line, plane). 

‘Again assume that © # 0, »: # 0, but now suppose that /and m are not paral- 
Jel. Then v, w are linearly independent and the first two columns of [0 | ~w'|q — p} 
are pivot columns. In the plane this means that the only possible echelon form is 


1] 


and so /and m have a unique point of intersection, For distinct lines in the plane, 


210 Geometry and Coordinate Systems 


ae 


FIGURE 4.9 


parallelism is equivalent to nonintersection, In space, however, there are two 
possible echelon forms: 


los 100 
O 1 +} and }O 1 0 
000 oot 


In the first case there is a point of intersection, but in the second ease there is no 
solution to the system Kr) = nit). 

Two lines that are neither parallel nor intersecting are called skew (Fig- 
ure 4.9), 


Exampce 4.3 
(a) In Example 4.2(b) the line through (1, 1, 1) and (2, 2, 0) and the line through 
(3, 0, 2) and (6, —2, 3) were shown to have [0] —w|q — p] matrix equal to 


1-3 
I 2 
-l -1 1 
which row-reduces to 1D 3. Thus these lines are skew, 
(b) Let 
1 1 4 -2 
moleGh EC 
Then 


wi-wig-i=[) 3 a] 


and one row-reduetion step gives us 
[; 2 3 
00 | 


The lines are parallel and distinct (i.¢., not the same line). For a nontrivial inter 


4.1 Geometric Vectors, Lines, and Planes 211 


FIGURE 4.10 


section the last row should be zero. This suggests adding 


[Jee & «=| “ur 


If we do the latter and look for solutions of a + ((1,) = mi 


we find that 


ees) 
| —wig —p—a)= 
lv] —wlq =p — al eas 


which has echelon form 


Vege ae 
[5 0 al 
and the two lines coincide. 
The lines / and m and the vector @ are sketched in Figure 4.10. = 


PLANES. 


Two distinct points determine a line, and three points, if they do not lie on a line, 
determine a plane. 

Let po, Py» Ps be three noncollinear points determining a plane =. Let 
by = Py — Po And vy = ps — po Then any point p in the plane 7 can be written as 
P = Po + 11% + fa02 for some choice of scalars f,, f,. To see this, consider the 
parallelogram (Figure 4.11) with py and p as opposite vertices and sides along the 
lines J(1) = py + (0, (i = 1,2). If the other two vertices are /,(/,) and L,(1,), then 
P= Po + 10s + 8 

If the points py. py. ps are all on the same line, then /,(/) and /,(t) are wo 
parametric representations of this line, and in particular v, and v, are linearly 
dependent. Conversely, if v,. Us is a linearly dependent pair of veetors, then /, and 
1, define parallel lines, both through po, and so py. py = 1,(1), and p = 1,(1) are 
collinear by Proposition 4.2. This gives us an easy test for collinearity 


212 Geometry and Coordinate Systems 


FIGURE 4.11 


Proposition 4.3. The points py. py. Ps are collinear if and only if the vectors 
Py ~ Po nd py — py are linearly dependent, —« 


0), and (—1, —1, 3) collinear? 


Example 44° Are the points (1, 1, 1). (2,2 


Solution Let py = (1.1 1), py = (2.2.0), and p, 


1,3). Then the matrix 
2 
(a1=PolPe—Pol=} 1 =2 
2 


row-reduces (0 


1-2 
0 0 
o 0 


and hence has linearly dependent columns. Thus the points are collinear. 


The expression p(t, fs) = Po + (Ps — Po) + lal Ps — Py) is called a paramer- 
ric representation of the plane. The next proposition summarizes the discussion on 
planes so far. 


Proposition 4.4. Let py, py, Pz be three noncollinear points. A parametric 
representation of the plane determined by py. Py Pa is 


Ply 2) = Po + (Px — Po) + tal Pa — Po) 


41 Geometric Vectors, Lines, and Planes 213 


Conversely, any expression of the form 


Pltys 4, 


Po + Py + tpg 


is a parametric representation of a plane, provided the vectors v, 
linearly independent. The plane is determined by the three noncolli 
Pr =Pot¥y and pp =Py+ Uy 0 


and v) are 
ear points py. 


‘The computational procedure we used for the intersection of two lines ex- 
tends immediately to the intersection of two planes, the intersection of a line and a 
plane, or even to the question of a point lying in a plane. 


Exampte 4.5 
(a) Does the point (6, —7, 1) lie in the plane with parametric representation 


1 1 1 
Ptyty) =| 0) +4]-2] +0] -1]2 
-I 1 0 
(b) Does the line 
2 1 
Mi) =]—l +5 
0 -2 
intersect the plane 
1 


Pty te) =] 


(c) What is the intersection of the planes 


1 = =3 
Pty Q}4+h) +4) 2 
3 =1 1 
~6 n =2 
5.5) =] 10] +s} —8] +s.) 14? 
10 -1 =I 
Solutions 
(a) We need to find 4,4, such that 
1 1 1 6 
0] +4,|-2] + 4]-1] =|-7 
=| 1 0 1 


214 Geometry and Coordinate Systems 


or 
top, 6 1 3 
-2 -1[n}=|-7]-] o}=]- 
1 0 i) [-t 2 
The echelon form of the augmented matrix 
ie Hie 1o }2 
il this 1/0" 3 
i monies oolo 
Thus 4, = 2, f, = 3, and (6, —7, 1) = p(2, 3) lies in the plane. 
(b) This time we need 4,, f, and ¢ such that 
1 1 -1 2 1 
W+e)-2]+e) 3]/=]-1]4+|-s 
1 =2 2 0 -2 
or 
-1 
3 = 
2 


The echelon form of the augmented matrix 


1 -t =I 1 102 
2 3 S| -2/ is jo 13 
-2 2 2/-1 000 


hence there is no solution. (The line is parallel to the plane.) 


(©) We need ty, fy, 54, $2 such that 


1 =I =3) fee " 
2/44 |- 10] +5,]—-8 
3 I 10 =1 
or 
—3 =" 2") poe) oft 
| V2, eo to] —]2} = 
= gh Es io} [3 


41 Geometric Vectors, Lines, and Planes 215 


The echelon form of the augmented matrix 


—1 -3 -Il 2 -7 102 0/4 
1 2 8 -1 8] is Jo 1 3 ofs 
al eta) ey, 000 116 


‘Thus 5, is arbitrary, say s, =, then 


4] [4-2 
t,|_|5—3¢ 
aiff) ae 
Sy 6 


Substituting (5), 
gives the line 


~6 nl 
He) = 4it,6) =] 10] +1} -8] 46] 1 
10 -1 -1 
=18 " 
=] 16) +0]-8 
4 I 
as the intersection of the two planes. 


As with lines, two planes are parallel if there is a translation that carries one 
to the other. A line is parallel to a plane if there is a translation that puts the line 
inside the plane. 


Prorosition 4.5 
1. Let two proper planes have parametric representations 


Pltys te) = Po + HPs + tabs 
GUS ye 52) = Jo +5181 + 82 


The planes are parallel if and only if w, and w, are linear combinations of u, and 
v, — that is, RANK [v, |e] |W] < 2. 
2. Let a plane and a line have parametric representations 


Ply. ts) = Po + Uy + & Ki) =q +0 


The line and plane are parallel if and only if the set of vectors (0. 0s 
linearly dependent — that is, RANK [v, |os|W] <2 


216 Geometry and Coordinate Systems 


Proof We prove statement | and leave statement 2 as exercise 26. The two planes 
re parallel if and only if we can translate one to the other. This means that a 
proper choice of the vector a will cause the system 


41) 


4+ pltyt 


= qs 


to have an entire plane of solutions. This means at least to nonpivot columns in 
the first four columns of (v|vs|—W;|—Wslqo — Po — a} Since the planes are 
assumed proper (ie., not lines or points), &,, U, are linearly independent and the 
first two columns are pivot columns (Proposition 3.1), and so the third and fourth 
columns are nonpivot columns. This means (Proposition 3.1) that —w, and —W, 
and hence w, and » are linear combinations of v, and v,. 

Conversely, if w, and w are linear combinations of &, and v., then the third 
and fourth columns are nonpivot columns and the solutions of Equation (4.1) are 


of the form 
‘i . . 
1h . . 
l=] tei + lo 


0 1 


with 1, ty linewrly independent, = 


CARTESIAN EQUATIONS 


‘The solutions of an equation of the form 


d 


ax + by +e 


form a plane in space. This is so because the echelon form of the augmented 
matrix [a b cd} will have one pivot column and two nonpivot columns among 
the first three columns (Proposition 3,7). The solutions will be of the form 


‘ 


For example, if the equation is 


then the augmented matrix is [0 1 —3/ 12], which is already in echelon form. The 
nonpivot columns of (0 | —3] are 1 and 3, and so x and z may be chosen arbi- 
trarily, say x = 4, 2 =f Then y = 12 + 


4.1 Geometric Vectors, Lines, and Planes 247 


x is 0 1 0 
y] =]12 +37] =|12] +] 0] +2]3 
z a, 0 0 1 


Again by Proposition 3.7 two irredundant equations 


ax + by + 
xt frtgrah 


will define a line in space, since the solutions, containing one arbitrary parameter, 
will be of the form 


Ma) =p +0 
Thus row-reduction can be used to derive a parametric representation of a 


line or plane from Cartesian equations. Row-reduction may also be used lo go 
from 4 parametric representation to Cartesian equations, 


Exampre 4.6. Find a pair of Cartesian equations for the line 


z 1 
ay = 3) +4 
5. 6 


Solution Two points that determine the line are 


2 3 
uo) =|3]. Wy=] 7 
3 " 
We need numbers a, 6, ¢ d such that 0) and ((1) are solutions of 
ax thy +cz=d. 
Write this as 


—d+ax thy +cr=0 


and seek solutions of 


: ef I) d 
S138 7 ieee 
b 


218 Geometry and Coordinate Systems 


The matrix 
SiPaean 3 r 10s ij 
Ei ay il sree lo 146 
So letting ¢ = ty. d = fy, we have 
d -s =F) 
«| _,|-4] , ,|-6 
o=4) altel o 
© 0. 1 


‘Thus we have an infinite number of coefficient sets (d, a, b, €) 10 choose from. We 
need wo irredundant equations, however; that is, the coefficient vectors must be 
linearly independent. Taking (1,,/) = (—1, 0) and (0, ~1) gives 

4ax-y  -=5 

6x -r=7 


ns With solutions 


as a pair of equ 


Mi) =|3) +1]4] 8 


EXERCISES 4.1 


In exercises | through 22 give the “incidence relations” for the points, lines, or planes. 
That is, does the point lie on the line or plane: do the lines intersect, are they parallel, 
coincident; does the line intersect the plane, lie on the plane, and so forth? 


point; (1.0, 1) 


Lines aa) = |] + 7 line: through (1, 1, 1) and (23,4) 


point: (4,0, 


+ we SJL] 
we 8] 3] 


=, 1 
5. line: through (—8,0) ar -7. r 
ine: through (—8, 0) and (—7. 1) 6. Tine (S}+ a] 


fine: through (—9, 2) and (—8, 2) 
-{-31,,[-2 
tine: [3] +272] 


' 
3 


3. line: through (1, 1) and (3,7) 
point: (2,4) % 


7. 


16, 


V7. 


4.1 Geometric Vectors, Lines, and Planes 219 


line: through (—1, 


' 0 
fine: |—3] + ¢]—1 


-3 =i 


—8) and (0, 


3 4 
line: | 0) +1 ]-2 


1 -1 
oe z 
line: | 4] +5 ]—s 
4 -2 
-2 1 
tine: |=1} +4] 1 
a -2 
line: through (—2,0, 2) and (—4, —2, 6) 


line: 
line: 
=! 
plane: +n] 0 
1 
point: (—2, 
1 =1 
+n ]-2} +6] 3 
1 =| 
1 
+if-1 
0 
plane: through (—8, —6, —1), (— 
line: through (9, ~5, —3) and ( 


-1 
+t) 2 
3 
2 
5 


plane> through (— 4), ( 
plane: through (0. 16, ~6), ( 


-7) 


1 2 AS) 
1 plane: |1] + ,)3] +] -6 
1 4 -7| 
points (8,9, 11) 
5 1 0 
13. plane: |—6] +4] 2] +4] 1 
1 0 -2 
= 2 
jine: | 13) +e] -5) 
9 1 
1 1 -2 
15. planes} 1] +4] —1) + 
=A 0 1 
ah 4 
fine: | 3) +¢]-7 
~6 -3 
and (=10, = 


-5) 


—18, —10), and (0, —4, 0) 


220. Geometry and Coordinate Systems 


omelet] =m Hells 
Be Ae 


21, plane: through (—2, ~4, -7), (—1, —4, —8), and (—1, ~3, —10) 
plane: through (2, —3, mee =6,3), and (—11, —8, 11) 


je melt te F| 


23. Letp and q be points in the plane or space. Let s, r be numbers between 0 and | such 
that s + /= 1. Show that r= 5p + fq lies on the line segment between p and 


Hint: Proposition 4.1 


5 


22. plane 


n 


mg 


4 


24, If the vertices of a triangle are on the tips of vectors a, b, ¢, then a + b + ¢) is the 
center of gravity of the triangle, A median of a triangle is a line segment from a vertex to 
the midpoint of the opposite side, say, from a to \(b + c) (see exercise 23). Show that the 
‘center of gravity lies on each median. 


25. Suppose that the vertices of a parallelogram are at the tips of the vectors a, b, ¢, d. 
Show that (a +c) = Mb +d) and hence the diagonals of a parallelogram bisect each 
other (see exercise 23), 


42 Coordinate Systems in the Plane and Space 221 


26, Prove statement 2 of Proposition 45. 
27. Show that a line and a plane in space must either intersect or be parallel. 


25. Show that the vectors 1, uy defined in the proof of Proposition 4.5 are linearly 
independent, 


4.2 Coordinate Systems 
in the Plane and Space 


We will use coordinate systems of a type called affine coordinate systems. Affine 
coordinate systems are more general than the familiar rectangular coordinate 
systems but less general than curvilinear coordinate systems such as polar coordi- 
nates. 

To lay out an affine grid on a plane, choose an origin and two lines intersect= 
ing at the origin for axes, The two axes need not be perpendicular (Figure 4.12) 
Next choose a unit point for each axis and mark off a uniform scale along each 
axis, The scales on the two axes need not be the sam 

Last, lay out a grid of lines parallel to the chosen axes. To find the coordinates 
of a point, first find where the grid lines through the point cut the axis and then 
read the coordinates off the scale marked on the axis (see Figure 4.12). 

In the familiar rectangular coordinate system, the axes are perpendicular and 


FIGURE 4.12 


222 Geometry and Coordinate Systems 


FIGURE 4.13 


the scales are the same on both axes. Since an affine grid consists of parallelo- 
grams, affine coordinate systems are easily handled by vector methods. 

Let a point p have coordinates X = (x, y)F in the given coordinate system 
and coordinates "__yF in @ new affine coordinate system. Let the point 0', 
the origin of the new coordinate system, be at p, and let the unit points on the new 
axes be at p, and py. Letv, = p, — py fori = 1, 2. Then (see Figure 4.13) we have 


(‘| = N =py + x'0, +)", =P +telel[*] 
or 


x 


Po + PX’ — where P =p, — pols — Pol 


ince the columns of P are linearly independent (Proposition 4.3), the matrix 
P is invertible and 


x 


PUN — po) 


EXAMPLE 4.7 Define an affine coordinate 
(1, =2), the unit point on the x’ axis at (4, —1), 
(8,0). 


stem in the plane with origin at 
ind the unit point on the y’ axis at 


(a) If p has x’y" coordinates (1, 3), what are the xy coordinates of p? 
(b) If p has xy coordinates (1, —1), what are the x'y’ coordinates of p? 


(La}-Lallfel-La] 


3 7 
=f a 


Solution 


P=(P1 — PolPe — pol 


42 Coordinate Systems in the Plane and Space 223 


@ Gl-x=n«r=[_2]+[F aIb] 
=La}+I-[] 
 [J=x=ro-ro=[t TC al-L2) 
sali alt 
[l-L] e 


Notice that an affine coordinate system in the plane is completely fixed once 
the three points py, py, and p, are chosen. 

‘The situation in space is a straightforward extension of the case of the plane. 
To define an affine coordinate system in space, one chooses an origin and three 
axes through the origin. The axes cannot, of course, all lie in the same plane 
(Figure 4.14). Choose unit points p;. Py on the axes and mark off uniform 
scales. 

In this case the point p with x! 


2’ coordinates X’ has xyz coordinates 


X = po + XU, + y'0y + 205 
er 
= Pro + [24 |¥2|0s)} 9” 


X = py + PX 


where P = [v,|v2|U4] and v, =p, — Po. 


Py 


Sey, 


FIGURE 4.14 


224 Geometry and Coordinate Systems 


We have a valid coordinate system when the points po, P,P2 and p; do not all 
lie in the same plane. It is useful to have a test for this. 


Proposition 4.6 The four points py. py. Ps. and py are coplanar if and only if the 
three vectors p, — Por P2 — Por and py — py are linearly dependent. 


Proof This is geometrically evident. We provide an algebraic proof as well. 

First assume that the points all lie in the same plane. Let p(t. ts) 
qo + Wy + faWy be @ parametric representation of the plane. Let Q = [w, [w. 
Then the plane can be written 


WT) = qy + OT 


where T= [ty tah" 
Because the points are all in the plane, there are vectors Ty, Ty. Tq. Ty such 
that p, = qo + Q7, and so, since 


Pi = Po = 4 + OT, — (qo + OT) = Q(T, — To) 


then 


P=[Py — Pol P2 — Pol Ps — Pol = Q(T, — Tol Tz — Tol Ts — To] = OS 


where P is 3-by-3, Q is 3-by-2, and S is 2-by-3, By looking at the echelon form of 
Swe see that SX = 0 has an infinity of solutions, hence PX = QSX = 0 also has 
an infinity of solutions. Thus by Proposition 2.11 the columns of P are linearly 
dependent. 

Conversely, suppose that the vectors py — py Ps — Por Ps — Po are linearly 
dependent. By renumbering the points if necessary, we may assume that 
Ps — Po =P: — Po) + Blpy — po). Then the expression 


Pity f2) = Po + (Py — Po) + tl Pa — Po) 


gives py = p(0, 0). p; = pl, 0), pe = p(O, b), py = pla, A). If the vectors p, — py 
and py — po are linearly independent, then p(/,, f,) is a parametric representation 
of a plane. Otherwise, it is a line or a point. In any case, p(/,, (2) is contained in a 
plane for all i,t. 


The proposition shows that the columns of P in the coordinate-change for- 


mula given above are independent, hence that P is invertible. Thus solving for X” 
in terms of X gives 


P-\X — py) 


EXAMPLE 4,8 Define an affine coordinate system in space with origin at 
(1, =1, 2), the unit point on the x’ axis at (—2, 0, 2), the unit point on the y’ axis 
at (—2, —2,3), and the unit point on the z/ axis at (—2, —2, 4). 


42 Coordinate Systems in the Plane and Space 225 


(a) If p has x’y’z’ coordinates (1,0, 1), what are the coordinates of P? 
(b) If p has xyz coordinates (1, 3,2), what are the x’y’z’ coordinates of P? 


eee 


Solution 


P= IP: — Pol P2 — Pols — Pol 


2 


(b) Using the formula X" = P(X = py), 


(132-71 208 
Sas . 


‘The simplest coordinate change is a translation, moving the axes parallel to 
themselves to a new location (Figure 4.15). Ifa is the vector to the new origin, then 
Po = 4, and if», is the vector from the origin to the unit point on the ith axis 
p, =u, +a and p, — py = v,. Since v, has a 1 in the ith component and 2 
elsewhere, the matrix P is an identity matrix and we have the next proposition, 
which applies to both the plane and space. 


FIGURE 4.15 


226 Geometry and Coordinate Systems 


Proposition 4.7. Let new coordinates be defined by translating the origin of 
coordinates to a. The coordinate-change formulas are 


a+X Xx” -a os 


An important type of coordinate change in the plane is @ rotation of axes. 
Before deriving the coordinate-change formula for an axis rotation, we need some 
facts about rotations 

Consider the mapping T:R?—> R? that rotates points through an angle of @ 
radians [Figure 4.16(a)] about the origin. A parallelogram will rotate to # parallel- 
‘gram, and rotations preserve lengths. Thus by Proposition 2.17 a rotation about 
the origin isa linear transformation, and to write its matrix we need know only the 
images of the points (1, 0) and (0, 1) under 7: By the definition of sine and cosine 
we have [Figure 4,16(b)]. 


T(1, 0) = (cos 0, sin A) 


and, since the rotation will preserve the angle between (1,0) and (0, 1), 


70.1) (00s (0 + 3). sa ( my 2) 


(sin @, cos #) 


‘Thus the rotation is given by 


- ‘cos — sin’ 
[ul 7= 709 =[ino wall] = 


Tip) » 


@ (b) 
FIGURE 4.16 


x 
ja,0) 


42 Coordinate Systems in the Plane and Space 27 


where the point p has coordinates (x,, 3). Here 0 is measured in radians, > 0 
indicates a counterclockwise rotation, and @ <0 a clockwise rotation 

The inverse of a rotation matrix is easy to calculate, The inverse operation of 
@ rotation through the angle @ is rotation through the angle —4; that is, 


Rit = 


_ [eos (0) BONE || cos sind 
~ tsin(—#) — cos(—0)1~ sind cos 


An APL function to produce a rotation matrix R, is 


v zAOT x 
[1] 2-2 1s 0x0 
a 


where X is the angle @ in radians. We will need this function from time to time. 


Proposition 4.8 Define x'y’ coordinates in the plane by rotating the x and y 
axes through @ radians. Then the coordinate change formulas are 


ial (ose Sle [*] | cos 0 Sale| 
yl Lsinad cosa tly" yJ~ L=sind cosa tLy 
Proof We have py =0, 
1] _ feos 0} f—sing 
m= Alol=[inel = Lt]=[ cca] 
and so 
vy [eos@ sind], y, 
BE pot FX [a ono |= 8X a 


Exampte 4.9 Introduce x'y’ coordinates in the plane by translating the origin to 
the point (3, 3) and then rotating the axes 30° clockwise (Figure 4.17) 


(a) What are the xy coordinates of the point with x’y' coordinates (V3, 1)? 
(b) What are the xy’ coordinates of the point with xy coordinates (3, 1)? 


Solution First we translate the origin of coordinates to the point (3, 3) obtaining 


an intermediate coordinate system, call it the x’y" coordinate system. Then by 


Proposition 4.7 we have 
E } i 
X=||4+% 


In the x"y’" system we now define the x's’ coordinate system by rotating the axes 


228 Geometry and Coordinate Systems 


FIGURE 4.17 


through 30° in the negative direction. This gives the coordinate-change formula 
(Proposition 4.8) 


ifv3 ot 
2l-1 v3 


Substituting the second formula into the first gives 
BE lp) 
yl 13d” 2b. yadly 
P | fas ed Avs U3 [S| 
a [=G)+3 =t ally) 3 
(b) Solving the equation 


pl-bleal 


ble 
x 
2t-1 v3 


tna gall cl Lane 


where the identity Rj! = RJ was used to invert Roy 


gives 


AFFINE FUNCTIONS 


Suppose we have an affine transformation 
is, in the plane or space. Suppose that ¥ 


" > R", where m = 2 or 3 — that 
(X) = 6 + AN. If we change to X" 


42 Coordinate Systems in the Plane and Space 229 


coordinates, where X =p, + PX’, then ¥ = py + PY" as well and 


Y= py + PY’ =b + A(py + PX’) 
PY’ =(b + Apy — po) + APX’ 
¥' = Pb + Apg — pg) + P1APX' 


Y" = qo + BX’ 


where B = PAP and gy = Pb + (A — 1)py). We record this formula for fu- 
ture reference. 


Proposttion 4.9 Let T:R*—>R" be an affine transformation. Let T be given 
by Y= 7(X)=6 +AX. Introduce X” coordinates via the formula X= 
Po + PX’. Then, in the new coordinate system Tis given by 


Y'=c+ BX" 


where B= PAP and c = Pb + (A — Np) 
In particular, if Tis linear (b = 0) and the coordinate cl 
fixed (p) = 0), we have 


nge leaves the origin 


Y'=(PUAP)X’ 


The formulas of Proposition 4.9 should not be memorized, In any particular 
application it is usually easier to start with the formula X = py + PX’ and make 
the appropriate substitution. This is especially true if one wishes to go from X" to 
X coordinates. Moving from X’ to X coordinates is a very common case, Typically 
we define a special coordinate system (the X” system) in which our problem has an 
especially simple form. We solve the problem in the X“ coordinate system and 
then use the formula X = py + PX’ to transfer the solution to the original coordi- 
nate system. The rest of the examples in this section illustrate this technique. 

We may put the formulas of Proposition 4,10 in more compact form by using 
the projective representation of an affine transformation, which is developed in 
exercises 39, 40, and 41. 


Exampte 4.10. The plane is mapped onto itself by rotation of 45° clockwise 
about the point (2, 2), Write a formula for this transformation. 


Solution A rotation of 45° clockwise (or —=/4 radians) about the origin is a 
linear transformation with matrix 


tn 
=1 1 


Then 


Define an x'y’ coordinate system by translating the origin to (2, 


230 Geometry and Coordinate Systems 


= (2,2) +X, and the transformation is given by 


1 lEl+ 


r= Pil=P M+ 


T(x. X9) 


or 


(This is a rotation of 7/4 about 0, followed by a translation.) = 


Examme 4.11 The plane is mapped into itself by reflection in the line through 
the origin at an angle of @ radians to the x axis, Show that this transformation, 7, 
is linear and find a formula for it. 


Solution ‘The easiest way to show T linear is to find a formula for T and thus 
show that Tis multiplication by a matrix. 

Reflection in the x-axis is given by (a, b) + (a, —b). 
by rotating the xy axes through the angle @. Then, in 
formula 


10 define x’y’ coordinates 
coordinates, T has the 


creel SE 


42° Coordinate Systems in the Plane and Space 231 


The coordinate-change formula is 


Ones 
eax 


renls Se=| 


(ee —sin?@  2sin 0 cos kk 
2sindcosd sin? — cost @ 


cos eal 0 
sind cosa lo —1 


cos 0 mes 
—sind cosa J” 


or 
P| A ee! ale 
vod Lsin 20 —cos 20 JLxy a 
Exampe 4.12, We map space to itself by projecting points parallel to the line 


through points (1, 1, 1) and (2,2,4) onto the plane through the three points 
(0,0, —1), (1,00), and (0, 1,0), Find the formula for this mapping. 


Solution In any coordinate system with x* and y’ axes lying in the plane 
and a 2’ axis parallel to the direction of projection, the mapping is simply 
(xh. XD) > (25. X5, 0). 

Such coordinate system is given by taking py = (0.0, —1), py = (1.0.0) 
= (0, 1,0), and py = py + (2,2,4) — (1. 1) = py + (1. 1,3), Then 


10 Offs; 
0 1 Ol fx) = Bx" 


232 Geometry and Coordinate Systems 


and 
Oo} fot 
X= po +P —PolP2 —PolPa — Polk’ =} O}+]0 1 WX” 
-1 OT ae 
Then 
0 
xh mle 0 
=! 


where P = [p, — Pol Pz — Pol Pa — Pok hence 


0 0 
Y=sPt ( -| 0 = BX 0 
-1 -! 
or 
0 0 
| | + PBPX — PBP“) 0 
-t -1 
or 


Yoc+AN 
where A = PBP™; that 
sane 8+ xBP 
2.000 1,000 -1 000 
1.000 2,000 ~1.000 
3.000 3.000 2.000 


and ¢ = (J — PBP~)p,; that is, 


4C-((1D 8)-P+ xB xfiP)+x0 0-1 
+ =e 


42° Coordinate Systems in the Plane and Space 233 


so 
=I] (2 ) =1fx, 

=[-"]4]1 2 -1]]/x,| = 
-3] [3 3 -2]}x, 


EXERCISES 4,2 


In exercises 1 through 5 new affine coordinate systems ai 


specified by giving the origin 


and unit points. Convert the given xyz coordinates to x’y"s' coordinates and convert the 
given x'y'" coordinates to xyz coordinates. 
1 po =O. 0. py =. =D, py =(-1.6) 
xjo 0 2 


3. fy =, =2). py = 2, = 6), pp = (2, =5) 


vjoo 1 -1 -10 oor 1-1 
yio lo -1 =10 ylovou = 
4 py = ND, py = (2.0.2) py = 2.1.0), py = (3,32) 
vjou =t soort 
4 { 
ane yooot 
o1-t so 101 
5 1.2) p, =, —4.3 pe 
wou at jo twa 
vjyaot yooto 
vlon = zlo0 12 


In exercises 6 through 11 a coordinate-change formula X = py + PX'is given and a eurve 
isdefined in.x’y* coordinates. On an xy’ coordinate system plot the x“ and y" axes, mark off 
the scales on the axes, and sketch the curve by first computing x'y" coordinates of points on 
the curve and then plotting the corresponding xy coordinates. 


234 Geometry and Coordinate Systems 


Elle shen 
[ae 


6 x=[f]exy 
r=[! tos 


mx-B)eR “ae 


1. (Computer assignment) X= [es Alte stay 


=! 


Hint: G-2 1+ 010 CHOP 0,02 gives x’y’ coordinates for ten points on this curve, 
and (2 23 1-3 1)+.xC gives the corresponding xy coordinates. 


In exercises 12 through 18 a new coordinate system is described. Find the coordinate- 

change formula X = py + PX’. 

12, Positive x’ axis 15 positive y axis, positive »” axis is positive x axis, no scale changes. 

13, ‘The new axes lie on the old axes but the distance from the origin to the unit point on 

the x’ axis is double the distance on the x axis and the distance from the origin to the unit 

point on the »” axis is half the distance to the unit point on the ¥ axis, 

14. ‘The new axes are obtained by rotating the old axes 30° clockwise 

15, The positive x” axis is the negative axis, the positive y” axis is the negative x axis, 

land the seales are the same, 

16, ‘The new axes are obtained by moving the old axes ten units in the direction of the 

point (3,4). 

17, The x” axis is the line 3) + 2x 
Note: Since the scales on the new axes are unspecified, there will be arbitrary param- 
eters in P. 

18, The x" and y/ axes are obtained by rotating the xy’ axes 45° counterclockwise about 

the 2 axis, The 2” axis is obtained by rotating the 2” axis 180" about the origin (in any 

direction!) 

19, Do the given sets of points all lie in the same plane? 


and the y’ axis is the line y = 2« 


Hint: Use Proposition 46. 
(a) 1D, (2.2.0), (= 
(by (1D, Ne). 2.13) ND 

(©) (=1, 1.0), (—2,0,0), (3, 1,1, (10,0, 3), (13, =. 4) 
(d) (=1, <1 =1, = 1, =2), (1,0, -2), (—2, = 1,0) 


In exercises 20 through 30 an alfine transformation ¥ = b + AX is given and a new 
coordinate system is defined by specifying the origin and unit points. Find the form 
Y= q + BX’ of the affine function in the new coordinate system. 


20, 


lets i yen <0. 1. py = (2.2), pe = (0.1) 


Lsl+Cs <a] 
bo] 


2 


+ Po = (0.1). py = (1-0), ps = (= 1,3) 


Py = (2. =2), py = G, —9), ps = B. —2) 


42 Coordinate Systems in the Plane and Space 235 


eal + [Fe AR Po =(—4, —3), py = (0. —2), po = 


oe Leable vale 


a ¥=[G]+[7) {ia =C10.0 =0.-.2= 00) 


-2,-3) 


0.4), py = (1, 3), py = (0.6) 


ass tae tlerli a Lol 
° ele Leal m-Ehs-E] 
molt i Helte-fle-E--Ele-fl 
welt g -Llm[ nf nn[ ol 
mf ooh [e[ hfe 


31, The plane is mapped onto itself by a 45* rotation counterclockwise about the point 
(1.0). Write a formula for this transfor 
32. The plane is mapped onto itself by reflection in the line x + y = 1. Write a formula 
for this transformation. 
33. The plane is mapped into itself by projecting points parallel to the 
the line x ~ y = J. Write a formula for this transformation. 
34, Space is mapped into itself by a rotation of 120° clockwise with axis of rotation the 
vertical line through the points (2. 3, —1) and (2, 3, 1), Write a formula for this ransfor- 
mation 
35. Space is mapped into itself by projection onto the plane through (1, 0, 1); 2, —3,2), 
and (0, 4,0) parallel to the line through (1.0.1) and (0.4.1), Write a formula for this 
transformation, 
36, The plane is mapped into itself as follows. A typical point p is first rotated 90° 
counterclockwise, then translated two units in the direction of the positive y axis, and 
finally reflected in the line x = —1 

(a) Write a formula for this transformation. 

Hint: If ¥ = T,(X) is the rotation, ¥ = TAX) the translation, and Y= TN) 

the reflection, then the transformation is ¥ = 73(73(7(X))) = P(N) 

(b) Find all the points mapped to themselves by this circuitous route [ie solve 

X= TX), 

(cl Move the origin to any point such that 7(X) =X {part (b)}, and write the 

formula for Tin the new coordinate system, 

(8) Rotate axes 45° and write a formula for the transformation of part (e) in the 

new coordinate system. 


ation, 


ney’ = 2x onto 


236 Geometry and Coordinate Systems 


(e) Give # different geometrical description of the transformation T. 
37. Space is mapped into itself as follows A typical point is first rotated 30° counter- 
clockwise about the x axis, then 90° clockwise about the y axis, then 150” clockwise about 
the = axis. 
(a) Write a formula for this transformation and then show that its linear. 
Hint: See the hint to exercise 36(a), 


(b) ‘This transformation is in fact a rotation in space. The axis is the set of points 
such that Y= 7(X), Find the axis, From a sketch, through what angle are the points 
rotated about this axis? 


38, Let p and q be two points in the plane, Show that a rotation of the plane about q is 
the same as a rotation about p, through the same angle, followed by a translation 


Hint: Without loss of generality you may assume that p is the origin. Now imitate 
Example 4.10, 
Projective Representation 
The femaining exercises introduce a representation of the alline transformation TX) = 
+b + AX that gives a coordinate-change formula more convenient for machine work than 


the formula of Proposition 49. Such representations are heavily used in computer-graph- 
ies applications. 


39, Let A, B be a-by-n matrices and let a, 6 be column vectors in RY 


(a) Show that 
0 1 0 
8} layab 4B 
where 0 = (0,0,-...0) an Re 


(>) Show that if A is invertible, then 


Is invertible 


clean | 


40, Given a vector X in RS, let X denote the vector 1, Vin R**1. Let T:R"—+ R be the 
affine function Y = 1(X) = b + AX, where b is taken to be a single-columin matrix. Let A 
be the partitioned matrix 


and 


Show that if ¥ = 7(X) then ¥ = AN — that is, T(X) is 1146 1.x. 


42° Coordinate Systems in the Plane and Space 237 


Note: The matrix 4 of problem 40 is a projective or homogeneous representation of 
the affine transformation 7: 


41. Let the affine transformation T: R* —» R® have a projective representation 


x 1 oO 1 
= e| = 4x 
ola [x] 
{see exercise 40). Let py. py, -- «Py, (0 = 2.3) define a new coordinate system, Let P 
[Ps — Pol Ps — Pal >-1Py — Pal and set 


‘Show that in the new coordinate system T’ has a projective representation 


YX 


PAR)X* 


Hint: To compute 2 sce exercise 39, Then compare 2°14? to the formula given in 
Proposition 4.9. 


In exercises 42 through 45 a function is described, Code the function in APL. 


42, Name: PREP 
Right argument: A set of points py. - . . Py ( = 2.3) defining a new affine coor- 
dinate system stored as [7%] )|-->| Pal 
Result The matrix L' of exercise 41 
43. Name: NEWTOOLD 
Right argument: A matrix with columns consisting of X* coordinates 
Left argument: Points defining the X” coordinates stored as [7%|y|* | Pal 
Result A matrix with columns consisting of the corresponding X oordi- 
tes 
Hint: Show X = PX“ and assume PREP (exercise 42) exists, 
44. Name: OLDTONEW 
Right argument: A matrix consisting of X coordinates 
Left argument: [/o| p;|--~| Pah points defining 4 new coordinate system 
Result A matrix with columns consisting of the corresponding X' coordi- 
nates 
Hint: See the hint to exercise 43, 
45. Name. NEWAFEN 
Right argument: [6| A]. where T(X}) = b + AX is an affine function and 1 is square 
Left argument: [| ;|->-| Pah points defining X’ coordinates 
Result [b|A"] where 7(X") = 6 + AN” 


Hint: Exercises 41, 42. J 


46. Computer assignment: Use the function NEWAFEN of exercise 45 to check your 
answers to exercises 20 through 30. 


238 Geometry and Coordinate Systems 


4.3 Quadratic Functions in the Plane 


Quadratic functions /:R®—» R can be completely analyzed by means of coordi- 
nate changes in R2, We shall do this here. The method used here will be extended 
to quadratic functions f:R" —> R in Chapter 5. The crucial step is the “diagonali- 
zation” of a symmetric matrix by a rotation of coordinates. This diagonalization 
process is quite important in multivariate statistics and in some optimization 
problems. 

Suppose that the quadratic function /: 


? — R is given by 


flxy) = axt + Dbxy +o + de + ey +k 


fxsyy=k + sfallete ale Ale 


Which we will write 


or 


J(X) =k + BX + XTAX 


where A = AT and B has a single row. 
If an affine coordinate change 


x 


Py + PX’ 


is made, then 


K+ BUpy + PXY + (po + PXYACDy + PX’) 
=k + Bpy + BPX’ + (ph + XTPT\(Ap, + APX) 
K+ Bpy + poTAPy + BPX’ + pRAPX’ + XTPTAp, + NTPTAPX’ 


Now as a matrix X’TPTApy is I-by-1 and so is certainly symmetric. Thus 
JAN) = fp) + (B+ 2PTAYPN’ & NTPTAP)NT 


First one chooses P so that there is no x’y’ term: (PTAPY[L: 2] = 0. 


Proposition 4.10. Let 


Then 


RIAR, = i al 
h 


43° Quadratic Functions in the Plane 239 


where 


Proof Carrying out the matrix multiplication, we see that 


( cos 4 ule bypeos@ sind yy 
—sind cos @\\b Allee coo 21 


= (c —a)sind cos + b(cos? @ — sin* 0) 


sin 20 + bcos 20 


Ie =a, we choose 0 = 7/4 and (Ry!AR,)[2; 1] = 0. Ifa # c, then we choose 
such that 


4 sin 20 + bcos 20 = 0 


5 


a-¢ 


If P = R, is chosen according to Proposition 4.10, then the second-degree 
terms of f(X") are 


; are 
xtprapy: = 91S ih J=axt +002 
0 wily 


Thus an axis rotation will always eliminate the term 2bxy 

Next one can shift the origin in an attempt to simplify the linear term 
(B + 2pTA)PX" as much as possible. If we try to set the linear term to zero, We 
have 


(B + 2phAyp =0 
B+ 2phA=0 — (Pis invertible) 
BT 4.24p,=0 (transpose of both sides) 
Apy = —4BT 


If A is invertible, we may choose py = —J4~'BT, and this origin shift will elimi- 
nate the first-degree terms (in elementary courses this is called “completing the 
square”). If A is not invertible, however, it may or may not be possible to choose 
po So that the linear terms are eliminated. 


240 Geometry and Coordinate Systems 


‘Assume now that A is invertible. Then by a proper coordinate change we can 
put f into the form 


Sey) +5 [8=flpol 


To gain an understanding of such functions, it is useful to plot the level curves. 
These are the curves 


Ae? + wy 


fix.) = constant 


and the set of these curves gives a contour map of the graph of the function 
=/(x,y) in RY 
If'we absorb 6 into the constant, ¢, the level curves are given by the equation 


dx? + 


which, depending on the signs of A, j, ¢ # 0, can be put into one of the three 
canonical forms 


where a = |c|/IAl, \c|/|u|- The first equation is an ellipse and the sec- 
ond and third are hyperbolas. The curves are sketched in parts (a), (b), and (c) of 
Figure 4.18. The asymptotes for the hyperbolas are y = =+Ax/a and are the level 
curves for c = 0, Since the constant c can take on both positive and negative 
values, there are only two distinct cases for /( dx? + py? + y = const. If 
and « have the same sign, the level curves are ellipses; if and x have opposite 
signs, the level curves are hyperbolas. Typical level curves for the two cases are 
shown in Figure 4.19. 

Notice that a square matrix 4 is invertible if and only if P-!AP is invertible. 


The matrix 
0 
[0 i 


(a) (>) () 
FIGURE 4.18 


43 Quadratic Functions in the Plane 241 


r 


FIGURE 4.19 


M>o 


is invertible if and only if \ # 0 and y #0. Thus Figure 4.19 applies whenever 
dw #0.1FX and yx have the same sign, then f has an absolute minimum at py if 
A and p are both positive [Figure 4.20(a)] and an absolute maximum at py if) and 
jc are both negative (Figure 4.20(b)}, If\ and p have opposite signs, then / has 
neither maxima nor minima. The graph of z = f(x,y) has a saddle point at py 
[Figure 4.20(¢)}. 

The x’ and y’ axes give the directions from p, in which the function fchanges 
the slowest and the fastest, If |Aj > |u). then the x’ axis gives the fastest rate of 
change: if |A| < |u|, then the y* axis gives the fastest rate of change. If\ = ju, the 
level curves are circles and the rate of change of fix equal in all directions from py. 


Exampe 4.13 Discuss the quadratic function 


Mx,9) = 1 + 2x + By bx? = day 9? 


be, ie 
Aw>0 aw<o Aw<o 
fa) (b) {c) 
Goh ots=/6199 


FIGURE 4.20 


242 Geometry and Coordinate Systems 


Solution In matrix form we have 


+22 avex[ |x =a + Bx + TAX 


f(X)= 


Since a 


the xy term may be eliminated by a rotation of =/4, and since 


[ 
2 
translating the origin to 


newer ells THE ]=E) 


will eliminate the lineur terms, Since 


meal “i 


oat la Wh “ef 5 


Thus the funtion has no maxima or minima, It has a saddle at py = (9). 
QR tH 143 “| 1 -27/4 


So in the new coordinate system 


we have 


{VY =W+ xt 3% 


Exampve 4.14. Sketch the curve 


1+ 8x + 12y + 3x2 + dey + Sy? = 6 


Solution This is just one of the level curves of the function 


Ss 2 
x 1 8 12v (7) bg 
AX) =1418 1+ x7[) 3% 


The rotation matrix that will eliminate the xy term is 
4PCROT THe. Sx 30242-3-5 


0,851 0.526 
0.526 0.85 


43° Quadratic Functions in the Plane 243 


The diagonalization of the matrix 4 is 
+D-(SP)+ «(AB 293 2 25)+ xP 
1.7660 © 6.943-18 
9.54618 6 2460 


Since A and ware both positive, the level curves of fare ellipses with centers at 


+P0~(8 120A)--2 
0.727 -0.909 


To convert from radians to degrees divide by =/180: 


TH-0-180 
a7 


so the angle of rotation is a bit more than 30° clockwise 
‘The constant term in the new coordinate system is /(p4) oF 


saet(8 12 4P0s KA)S. PO 
736 


which is the minimum value attained by f. In the new coordinate system the 
equation of f(x,y) = 6 is 


7.36 + 1.76x* + 6.24y 


or 


1.76x® + 6.24? = 13.4 


13.4/1-76 and y’ = /13.4/6.24, or 


This ellipse cuts the new axes at x’ = 


+ INTERCEPTS. ((6-3)-1 1%D)*-2 
275 146 


To sketch the ellipse, one finds some points on the curve in x’y’ coordinates and 
stores them as columns of a matrix M, The corresponding xy coordinates are 
obtained by adding p, to the columns of PM (see, however, exercise 43 of Exer- 
cises 4.2). Notice that the x” axis is the line through the points with x’y’ coordi- 
nates (==2.75, 0), and the y’ axis is the line through the points with x’y’ coordinates 
(0, +146): 


u 
2.750 2.750 0.000 0 900 
9.000 0.000 1 460 -1 460 


244° Geometry and Coordinate Systems 


FIGURE 4.21 


(84 2pP0)+Ps aM 
1.610 -3.070 0.042 ~1.500 
2.360 0.538 0.336 2 150 


The curve is sketched using these four points in Figure 4.21. = 


Notice that a given equation may not be satisfied for any pair of points. In the 
last example the minimum value of f was —7.36; thus the equation 


= 1+ 8e + Ly + 3x2 + day + Sy 


=I, 


is not satisfied for any pair of real numbers (x, )). 


The level surfaces of f(X) =k + BX + XPAN are ellipses or hyperbolas 
when A is nonsingular. When 4 is singular, they are parabolas or pairs of straight 
lines (degenerate ellipses), 


DEGENERATE ELLIPSES 


This is the case where A is singular but the equation Ap, = —487 still has solu- 
tions, In this case there is a whole line of solutions p, and placing the origin of the 
xy’ coordinate system on this line will eliminate the linear terms. 

In this case, since \ =0 or » = 0, f may be put in one of the forms 


SY) HEME fle) =3 + yy? 
Taking for example the first form, the level curves are of the form x" 
and the graph of y = f(x,y) is a trough or ridge with a horizontal bottom or top 
(Figure 4.22). 


43° Quadratic Functions in the Plane 245 


apo 
neo) 
(b) 
FIGURE 4.22 
PARABOLAS 
Suppose that Apy = —487 does not have solutions. If PAP = D has a zero 
column, then the same column of A? = PD is also zero, since (PD)|3i] = PDIsi |. 


Suppose that 


RYAR, = Ry'AR, [ Al 
On 


Then AR, ha 
gives 


a zero first column. Using the coordinate change X = py + Ry" 


JAX") = 8 + (BR, + 2TAR,)X! + XR_pAR,)X! 


Let 


by bah DAR, 


Then the X’ coefficient is 


Ub, Balt bs aly Gt]= lor be tart + er 


Since A 4 0, at least one of the numbers a,, ay is nonzero, and so the equation 
by + 4X) + Gz) =0 has an infinity of solutions py = (Xo..%9). Thus by the 
proper choice of p,, f may be put into the form 


fy) = 8 yx +? (7 = (BRM) 


246 Geometry and Coordinate Systems 


a 


B=0,7>0,0>0 


(a) (b) 
FIGURE 4.23 


Similarly, if » = 0 and \ 40, f may be put in the form 


fey Sty +ax®  (y =(BR,):2) 
In either case the level curves /(X) =0 are congruent parabolas and the 
graph of z = f(x, ») isa trough with a slanted bottom. The function /has neither 


maxima nor minima (Figure 4.23). 
ExampLe 4.15. Sketch the curve 


Dax ty + x8 + dey + 


Solution This is the level curve /(x, y) = 0 for 
fi =140 Ke call ie 


‘The matrix A =2 2p can be diagonalized by a rotation through @ = 
grees (Proposition 4.10), In this case we have 


eteen ect Hl il “HE § 
eee ie ee A) 


=, (0 24+ by val ‘) 


= V22x9 + Yo) 
=|0 V2] _ if we choose p, on the line y = —x 


4.3 Quadratic Functions in the Plane 247 


The new constant term will be f(p,) and f(x, —x) = 
Po = —¥). Then f(p,) = 0, and in the new coordinates 


— 2x, so take 


Six’ y') = V2" — 2x? 


and so the level curve f(X) =0 is 


y= Vixt 
A table of values for this curve in x'y’ coordinates is 
*VAL-2 To(6 CHOP “8 3),(2+.5)¥(6 CHOP 3 3)+2 

3.000 “2.000 1.000 0.000 1.000 2.000 3.000 
12.700 $660 1.410 0.000 1.410 §.680 12.700 
(The function CHOP is defined in Section 3.1). Using the formula 

N= Po t+ PX’ 
the corresponding xy coordinates are 
(87 2p 5 ~ )+(ROT O-4)+ xVAL 
10.600 4.910 ~1.210 0.500 0.207 -2.090 -6.380 
6380 2.090 0.207 -0.800 1.210 4.910 10.600 


Using these points, the curve is sketched in Figure 4.24. 


The formulas of Proposition 4.10 are often inconvenient for hand calculation, 
It is possible to write down a matrix that will diagonalize a given 2-by-2 symmet- 


Joety teeter tye =0 


FIGURE 4.24 


248 Geometry and Coordinate Systems 


ric matrix without the use of trigonometric functions. One uses the trigonometric 
identities 


cos (} Tan x) = (1 + 
sin (} Tan“! x) = x(1 + x2)-¥? 


plus some algebra to discover the next result, which we can check by computing 
PTP and P>lAP. 


Proposition 4.11 Let 


deb 


and seta =a —0, f= Va® +4 


Then the matrix 


= xbt 


P VB +a Shee | 


i 
valet VB +a 


is such that P>! = P7 and 


map = fates 
PTAP 2 0 


0 | “ 
a+c-B 
EXAMPLE 4.16 Briefly describe the function 

S(x,y) = —2y + 2xy — 
Solution 
f(X) = [0 -2K 4" ik ah 
So, using the notation of Proposition 4.11, a = 1,« = 1, B = V5, 


peat [pvseest =VV5=1), prap = 1[—! + V5 0 
viVslVV5—1 VV5 +1 2 0 -1- V5. 


and py = 44-'B = (0, —1). 
Since /(0, —1) = 1, the coordinate chang: 


Po + PX" puts finto the form 


S('.y’) = 1 + V5 — Ix? = HL + V5)? 


‘FMonadic x gives the sign of its argument. 


43 Quadratic Functions in the Plane 249 


The level curves are hyperbolas centered at p, = (0, —1), and so Phas a saddle 
point at py i 


EXERCISES 4.3 


In exercises 1 through 11 use Proposition 4.11 to classify the given curve as a hyperbola, 
parabola, ellipse, pair of parallel lines (degenerate ellipse), pair of intersecting lines (de- 
generate hyperbola), point, or empty set (imaginary ellipse). 

4y + Sx? + ley + Ry? = —4 
4x? +4? — 4e + 8) +5 =0 


1 
3. 2x? + aey + = 1 
4. 
3 


2x? + ey + Sy + Oy = -1 
Sxt — 2ay + My? — 2x + 22y 


6-2" —6y + 2ky = — 

7. =3x% + Oxy + Sy* — 6x — 29 
& 

9. 


Sy — 16x + 16x? — Loxy + 49! 
Ax? = day + y? + Be = dy + 
10. 4x + Bey + 4y?# —y —2y 3 
H. 2x* — So + 8? — 9x + 16 + 10 = 0. 
12. Write an APL function called ANG. 


Input: A 2-by-2 symmetric matrix A 
Output; An angle @ in radians such that RZAR, is diagonal, Thus PACT ANG A 
computes Ry 


m, minimus, saddle 


In exercises 13 through 17 decide if the given function has a maxi 
point, or none of these. If such a point occurs, where is it? 

13. fixy) =5 + By — ax + 4x8 + ay 

14, (x.y) = 16 + 4y + Set + I2ay + 82 

15. f(x,y) =4 + By — 1x + 16x? — Hoxy + 4" 

16. flxy) = Woy — 9x + 2x? — Bay + By? 

17. flay) = 6 — 6x — 2y — 3x8 + Gay +5)" 


(Computer assignment) In exercises 18 through 20 perform a rotation and translation of 
axes {0 simplify the equation, On an xy coordinate system sketch the and y axes and the 
curve. 


18, Sx? — 2ey + 1b? — 2x + 22y + 10=0 
19, 3x —2y + 4x? + Bey +47 =O 
2, 2—2y = 6x + 5y2 + 6xy — 3x7 =0 


Projective Representation 


The remaining exercises are a continuation of the material developed in exercises 39 
through 46 of Exercises 4.2 


250 Geometry and Coordinate Systems 


21. Let the quadratic function /:R" +R be given by 


fiX) = 0 + BX + XTAN 


where = AP, Bisa single-row matrix, and c is a sealar, Let Q be the symmetric matrix 


Show that the function fis given by 
sO = XTOX 


(see exercise 40 of Exercise 4.2 for 3), 
Note: The matrix Q of exercise 21 is a projective representation of f 
22. Let (RY +R have projective representation Q (see exercise 21) and let a new 
coordinate system be defined by the matrix £ of exercise 41 of Exercises 4.2. Show that a 
projective representation of fin the new coordinate system is 
JAX) = XULTOBIX 


Hint: Compute L7QP and compare with the computation preceding Proposition 


4.10. 
23. Write an APL function with 
Name. NEWQUAD 


Right argument: The terms of the function /(X) = ¢ + BX + NAN stored as 
¢ 14 
ola 
that is, (¢, 8), (1]0.4 


Left argument: [fy Py «+ Pyl defining X* coordinates 


| B 
Result 


+ BX 4 TUN 


where /(X"1 
0 


Hint: Exercises 21, 22, 


44 Flats and Coordinate Systems in R" 


In this section we formally extend the techniques of the first two sections of this 
chapter to R", the set of vectors with n components. Sections 4.2 and 4.3 deal with 
cases n = 2, 3. The definitions and propositions of this section apply in a rather 
trivial way to the cases n = 0 and n = 1 as well as n > 4. 

The first task is to extend the definition of lines and planes to R®. This is done 
by defining a k-dimensional flat in R", 0 <k <n. A k-dimensional flat will be 


44° Flats and Coordinate Systems in RX 251 


called a point if k =0, a line if k = 1, a plane if k =2, and a hyperplane if 
kan-1. 
For n 


2.3 we have the definitions 
point: py € R* 
line: K1) = pry + (py — Pods Po # Py ER. —20< 1 < 0 


Plane: ply. ty) = Po + ACP: — Po) + ll Po — Pudi Por Pr Po ER" 
26 < ty. fp < 20, and p, — Py, p» — py linearly independent 


The condition p, # p, guarantees that (1) is a line rather than a point, The 
linear independence of p, — py guarantees that p(t. 14) sweeps out a plane rather 
than a line as 4, and 1, vary. 


DEFINITIONS. Let po, pysPy -- « .Py be vectorsin R" with py — py. Ps — Por -« 
Px ~ Po linearly independent. The set of vectors swept out by 


PAE bans = = te) = Po + (Py = Prod + CPs — Pro) + +++ + (Px — Po) 


AS (ye fay + «vs ty Vary is called a k-dimensional flat in RP. 

We should say something about the way points and lines are formally made 
to fit this definition. 

Recall (Proposition 2.11) that the columns of the matrix P 
[Ps — Pol Ps — Pol + | Pa — Po)ate linearly independent if and only if the equa- 
tion PX = 0 has only the solution X = 0. For a line, k = 1, and so P has a single 
column: p, — py. I follows that PX = 0 has only the trivial solution if and only if 
Ps — Py #0 — that is, p, # py. Thus a line is a one-dimensional flat by Definition 
4.2, For the case of a point in R we take the matrix P to be m by zero 
(P —(n, 0)p0) and adopt the convention that an empty set of vectors is linearly 
independent, Then Definition 4.2 applies, and a point is a zero-dimensional flat. 

Taking an empty set of vectors to be linearly independent is not an arbitrary 
convention. For example, if P is n by zero, then PTP is 0 by 0 and hence is 1D 
0. Thus P has a left inverse and the columns of P are linearly independent by 
Proposition 2.9. Proposition 2.11 leads to the same conclusion (exercise 23), 

‘A more compact notation than that of Definition 4.2 is desirable. We will let 7 
denote the column vector with components fy. fy). .f, and use the matrix P 
defined above as P = [p, — pol P» — Pol --+ fy — Poh Then the A-flat is the set 
of vectors swept out by 


AAT) = po + PT (4.2) 
as the parameter vector varies in R*. 

Since we no longer have the geometry of R? and R® to fall back on, there are 
some basic facts to be verified algebraically. For example, can a set of points be 
both a &-dimensional flat and an /-dimensional flat with k # 2 The answer is noz 
but the reason why is not obvious. 

To begin with, let us refer to the set of vectors p(77) of the form (4.2) as a flat 


252 Geometry and Coordinate Systems 


even if the columns of P are not independent. Our first task is to show that a flat 
is a k-dimensional flat for some k. The next proposition is our main tool. 


Proposition 4.12 The flat F, defined by 


GS) = q+ OS. Sin RE 


is contained in the flat F, defined by 


PT) =po+ PT. Tin R* 


if and only if (1) qo is in Fy, (2) @ = PR for some k-by-! matrix R. 


Proof First assume that the flat F, is contained in the flat F;. Then, in particular, 
qo = (0) is @ point of Fy, and statement (1) is proved. Let q = py + PT. Let 
% qo + OS,, where S, is the ith column of the /-by- identity ma- 
tix. Then q, is in F,, hence q, is in F, as well. Let g, = py + PT, and define R by 


hk — Yo = Po + PT, — (po + PT) = PUT, — To) = PRIA) 


and hence Q = PR. This proves statement (2). 

Next assume that statements (1) and (2) are true, To show that F, is contained 
in Fy we must show that for each S in R! there is a T in R¥ such that 
qy + QS = py + PT. Since (1) is true, we have qy = py + PT, and since (2) is 
true, 


Yo + OS = py + PT, + (PRIS = py + PUT, + RS) 


and 7 +RSisinR ow 


Now we will show that a flat is indeed a A-dimensional flat for some k. The 
proof is based on a variation of the formulas of Proposition 3.1. Let P be a matrix 
and let R be the matrix obtained by dropping the rows of zeros, if any. from the 
echelon form of P. Let V be the vector of indices of the pivot columns. Then 
P = PU;VIR (exercise 24). 


Proposition 4.13 The flat defined by 


WT) =Po+ PT. Tin R* 


is an dimensional flat where / is the rank of P. 


Proof Using the matrix R and vector V defined above, we have P = P[:V JR. 
Applying Proposition 4,12 with q, =p, and Q = P{:V/|shows that the flat p(T) is 


44 Flats and Coordinate Systems in R® 253 


contained in the flat defined by 
qs) = py + PISVIS, Sin R! 


Similarly, the equation P[;¥'] = P/{;V'] shows the reverse inclusion, Thus the 
two flats are equal. Since the columns of P[;V/] are linearly independent, the flat 
is an /-dimensional flat where /is the number of pivot columns — that is, the rank 
—ofPh « 


It follows from the next proposition that the dimension ofa flat is well defined. 
A fiat cannot be both k-dimensional and /-dimensional with k # . 


Prorosirion 4.14 Let the dimensional flat F, be contained in the A-dimen- 
sional flat F,, Then |<, and / = k if and only if Fy = 


Proof Let F, be given by 4(s) = qy + OS and let F be given by p(T) = py + PT, 
where Q is n by / with linearly independent columns and P isn by k with linearly 
independent columns. By Proposition 4.12 we have Q = PR, where R is k by |, 
Now @ has a left inverse, call it L. Since J = LQ =(LP)R, R also has linearly 
independent columns. This means that every column of & is a pivot column (look 
at the echelon form); hence there are at least as many rows as columns in R: that 
isk > 

Now assume that / = k. Then & is square and hence invertible, It follows that 
P = QR™ and so, by Proposition 4.12 again, F; will be contained in F, provided 
po isin F,, But we have great freedom in the choice of py. In fact if pf is any Vector 
1n Fy, then the flat given by pj, + PT is exactly F, (exercise 25). Replacing py by qos 
we see that =F, 


DEFINITION 4.3. Let py pyc +o? bE points in R" The flar defined by 
PovPys> + +Px is the flat given by 


AT) =py+ PT, Tin 


1--*1Px = Pol 


ated by a set of points is possible. 


Proposition 4.15. The flat generated by po, py, « - «Py is the smallest flat con- 
taining poy py. +» «Pee That is, any flat that contains p,, p;... - +p, also contains 
the flat defitied by py. Pyr-- +P ® 


The proof is left as an exercise (see exercise 26), . 
‘The proof of Proposition 4.13 shows us how to find the dimension of the flat 
generated by a set of points and also how to eliminate redundant points. 


254 Geometry and Coordinate Systems 


EXAMPLE 4.17 What is the dimension of the flat in R* defined by the points 
(1A, D, 2, —L LW), (HL, 6, 2, = 1), (—3, 12.4, 5), (6, = 11, 0, 4)? 


Solution Let these points be py, Py» Pa Px» Pa Fespectively, and set P= 
(Pp: — Pol Ps — Pol *-*|Pa ~ Pok 


Following the proof of Proposition 4.13, the echelon form of P is 


ECHELON P 
1.000 0.000 2000 0.000 
0.000 1 000 3 900 0 000 
9.000 0,000 6 000 1 000 
0-000 0.000 0.000 0.000 


Thus Py. Py» Po» Pye Py define the 3-flat in R¥ consisting of all 


WT) = po + PUL 2 Alte, 
ty 


t 
4 


— that is, the 3-flat is defined by py Py Po Py 


Finding the intersection of a k-flat and an /-flat in R" is also simple. Suppose 
that the k-flat is given by 


WT) = Po + PT. Phi] 


Pi —Po 
as T varies and the /-flat is given by 

WM =H+OT, Asl=q —G 
as T varies. The intersection of these wo flats is the set of points expressible both 


as p(T,) and as q(T;) for suitable choices of the parameter vectors 7, and T;. This 
is the set of solutions of 


Po + PT = qy + OS 


PT — QS = 40 — Po 
or 


lo — Po 


44 Flats and Coordinate Systems in RY 255 


where to avoid confusion we have renamed the parameters for the /-flat 
Sabp-.- a 


Exampte 4.18 Find the intersection in 4-space of the plane through the three 
points py = (1,0, —3, 2), py = (2, -2, -5,0), py =(2, —1, —5, —2) and the 
plane through the three points gy =(—1.3, 2,1), 9 = (0.2, 
(-3,6,3,4). 


Solution Let P= |p, — polP2 — Poh 9 = 141 — o\4e — Goh 


Pp 
1a 
24 
2 +2 
2-4 

° 
v2 
aes 
a5 
es: 


and reduce the augmented matrix of the system [P| —Q]X = qo — po to echelon 
form: 


ECHELON P, (-0).00-P0 


1000 0 00£0 0 00£0 0 0060 1.2061 
347E18 1 000 3.47E-18 0 00£0 40060 
0 000 1.39617 1 00€0 2786-17 -1.60€1 
1396-17 0 00£0 0 00£0 1, 00€0 1 3061 


The system has the unique solution 


so the two planes intersect in the point 


POsP+ x12 -4 
9 20 19 6 


which can also be expressed as 


2010+ 16 19 


256 Geometry and Coordinate Systems 


In general, two flats may intersect in more than a point. It is convenient to 
have a procedure for writing down the intersection. 
Starting with the equation 
Po + PT = 4 + OS 


we get the linear system 


t7\-01[£] = 4 — Po 
Row-reducing in the usual way, we get a solution involving arbitrary parameters, 
SAY Uys yy «+ + tye In fact, the set of solutions will be expressed in the form 


|S a 
x=[F] =n, + RU, where U = [ly tips +s Ul 


Partition R, and R in the same way that X is partitioned. 


Then 
r Uy v uy + VU 
[S]= Del bele= ewe 
or T=) + VU, S = wy + WU, Substituting in the original equation, we have 


for any choice of the parameters U = [tty tay» « ty 


Po + Plug + VU) = qy + QOwy, + WU) 


or 
(po + PUp) + (PV)U = (qo + Qny) + (QW)U 


This means (see exercise 22) that py + Puy = qy + gwy and PV = OW. 
As the parameter vector U varies, 


PLU) = (po + Puy) + (PV). 


Out the flat defined by pj pie». 
PV iL 
This flat is the intersection of the two given flats. 


Pls Where ply = py + Poy and p 


Incidentally, we have proved 


Propostrion 4.16 The intersection of two flats isa flat, = 


44° Flats and Coordinate Systems in R® 257 


Exampte 4.19 Find the intersection in R® of the flat defined by 


(—21, —30, —8, —15, —10), 
(—16, -25, -7, —13, —8), 
(—21, 30, —8, —16, -11), 
(-11, —20, -6, —14, —9) 


and the flat defined by 


(22, 52, 8, 19, 14), 
(20, 49, 7, 17, 12), 
(2, 32, 4, 16, 11), 

(23, 50, 8, 18, 14) 


Solution Label the points pos Ps Ps» Ps ANd diy Gus 4» 4 Fespectively, and define P 
and Q as usual: 


Pp 
5 0 10 
5 0 10 
0) 
Seer A 
zoroy 
° 
Fie Gl 
a 201 72 
a 74) 8 
2 3, 1 
Ze Lt 


Reducing the augmented matrix of the system [P| —Q] Po gives 


*EcECHELON P.(-0).00-P0 


10060 0. 00E0-— 2 :00ED 8 BE 194 00EO 7 0060 
5206-18 1.0060 3.0060 3.4718 5.0060 8 0060 
347618 0.0060 6.84618 1.00€0 1.396 17 9 00€0 
173618 0.0060 3.47618 0.0060 6 948-18 1 o0E1 
Dove © 00ED_-—«« DED. BIE 19 0 .00E0 1 age 17 
‘The nonpivot columns are E[; 3. 5}, and so x, (= f,) and x, (= s,) may be chosen 


arbitrarily Then the solutions are 


258 Geometry and Coordinate Systems 


x, =7 —2u; — 4ity 


or 


Hence 


0 reo) 


l 
$ 


py 


the line through (14,5, — 
Using 8 instead of T,,we have 


-1 
=I 
-20 
-20 
-4 
-3 
=3, 


20 
20 


=A 
3 
1 
en) 
0 ! 
0 0 
9 00 
O}+]o 1 
10. 00 
10 
10] /\7 
2|{}s] + 
1|Mo. 
1 
le 4 
Uy 5 
=|-1 
-9 
-4 


| 


= 3u, — Suz 


9 
oO} + 
10 


[nl 


10 
-2 -4 [is 
-3 -s|lu, 
1 0 
—20 
—20 
uz] —4 
-3 
-3 


9. —4) and (—6, — 15, ~$, —12, —7). 


Ole 
1 
0. 


4.4 Flats and Coordinate Systems in R" 259 


14 -20 

5 —20 
uy) = |=) +m) —4] oo 

~9 -3 

-4 -3 


‘The reason for the mysterious appearance and disappearance of w, in the last 
example is that the four points pp. p,, P2, Ps are coplanar and hence do not detine a 
3-flat. The point p, can be discarded without changing the problem. If this is done, 
then only one parameter appears in the calculation. 

‘The next two propositions give connections between the theory of flats and 
previous work on linear equations and affine functions. 


Prorosttion 4.17. The set of solutions of a system of linear equations is a flat, 
More precisely, the set of solutions of k irredundant linear equations in n varia 
bles is an (1 — k)-dimensional flat in R’. 

Conversely, every flat in R® is the set of solutions of a system of linear equa- 
tions inn variables, 


Proof The first statement is just Proposition 3.7 in different terminology, The 
converse statement may be proved by generalizing the procedure of Exam- 
ple46.  » 


Recall that the image of a function /: R" — R” is the set of all vectors ¥ in 
R™ such that ¥ = /(X) for some vector X in R". If S is a subset of R", we will refer 
to the set of vectors ¥ in R™ such that ¥ = /(X) for some NX’ in Sas the image of 
S under f- The image of $ under fis often written /(S). 


Proposition 4.18 Let T(X) = 6 + AX be an affine function 7: R" — R" The 
image of Tis an r-dimensional flat in R®, where ris the rank of A. If 7 is a fla 
in R”, then the image of 7 under Tis a flat in R". Further, if r =n, then and 
T(x) have the same dimension. 


Proof The image of Tis the set of vectors in R™ swept out by T(X) = b + AX as 
X varies in R" —an r-dimensional flat by Proposition 4.18, Suppose that 
PU) = po + PU is an (dimensional fat in R". Then 


T(p(U)) = b + Alpo + PU) = (b + Apo) + (AP)U 


a flat in R. If P has linearly independent columns, then it has a left inverse L. If 
then A has a left inverse L, and so AP has the left inverse L,/-y. Thus AR 
has £ columns, and these columns are linearly independent. This shows that 
T(p(U)) is dimensional as well, = 


260 Geometry and Coordinate Systems 


TW) 


¢ 
FIGURE 4.25 


EXAMPLE 4.20. Let the affine function T: R* + R" be given by 


10 0] 
T(X)=AN  whereA=]0 1 0 

00 0 
Since the rank of A is two, the image of T is a plane in R® In fact, since 
T(x, 9,2) = (30), the image of Tis the xy plane, T is a projection onto the 
xy plane, Most planes in R! are mapped onto the xy plane by 7 and so their 
dimension is preserved, However, planes parallel to the z axis, such as the 
plane and the yz plane, are mapped into lines in the xy plane (Figure 4.25). 
Similarly, most lines in R# project to lines in the xy plane. Lines parallel to the 2 
axis, however, project to points in the xy plane 


Proposition 4.18 can be used as the basis of a purely geometric definition of 
affine map. Affine maps can be characterized as those maps that carry flats to flats 
and preserve parallelism. (Parallelism in R” will be defined in the next section.) 


COORDINATES FOR A k-FLAT IN R® 


Geometrically all lines and planes are the same — even if some lie in R® and some 
lie in R®, Every plane is in some sense a copy of R¥, every line a copy of R', and 
every 3-flat a copy of space — R®. More generally, every k-flat is in some sense a 
copy of RF. 

Given a k-flat 7 in R®, we can make this statement explicit by choosing a 
coordinate system for ¢. Such a coordinate system will assign a unique k-tuple of 
coordinates (x4, xj, . . ..x;) £0 each point in-z. Although this general statement is 
interesting and occasionally useful, the most important case arises when k = n, In 
this case 7 = R®, and we are simply defining a new set of coordinates for R™. In 
Section 4.2 this process was carried out for k =n =2 and k 

If isa k-flat in R%, then any set of k + 1 points po, Py. - 


44° Flats and Coordinate Systems in R* 261 


can be used to set up an affine coordinate system for =. The origin is p, and the x/ 
axis is the line (I-flat) through py and p,. The unit point on the x; axis is p, (see 
Figures 4.13 and 4.14). If the point p in R" has coordinates (x,, X2y.. . ..X,) and p 
lies in =, then the new coordinates (x1... . . xj) of p are given by 


4 PX 
Ps — Pol Ps — Pol+=-\Px —Pok X =[X%p- +s XqF) and X= 
xu 
In other words, the new coordinates of p are just the parameters giving p in 
the parametric representation of the A-flat given By Py, Py. +» «Pu: It is precisely 


the same formula as we had in Section 4.2 for the special cases k =n = 2 and 
k=n=3 

In the most general ease the matrix P is not square (it is 1 by A), but since the 
columns of P are linearly independent, P always has left inverses. If Lis any one 
of these left inverses, then 


X’ = LIX = po) 


This latter formula is valid only when the components of X are coordinates of 
point in =, since these are the only points with X’ coordinates. Convenient choices 
for L are BP and (k, n)tGAUSS P. The latter approach ean be used for hand 
computation. 


Exampce 4.21 Let = be the plane in R* through the three points py = 
(1, 1.0, 2, py = (2, -3, <1 2, —2, =2,3). 
(a) Using py, py. Pz to define a coordinate system on 
change formula 


=, write the coordinate- 


(b) What are original coordinates of the point with new coordinates (x,.x5) = 
a, ty? 


(c) Does the point (1, 3,0, —2) lie in =’ 


Solution 
(a) The coordinate-change formula is X = py + [Pi — Pol Ps — PolX’ oF 


¥ 1 I 1) Ps] 
ty =1) |-2 -1) by 


Win) [a 
x 1 1 
(b) 
1 Pt ('] 3 
ET | |-4 
o}* 2) Es 
1 Teor 4 


262 Geometry and Coordinate Systems 


(0) 
*i2 1-1 -1 


isa left inverse for P, and so if (1, 3,0, —2) lies in P, it will have new coordinates: 


1 1 
¢ 3 al 3}_|-1 fell 
1 =t =1)}] 0)7} o]f Lk 2. 
-2 1 
But the only point in 7 with X’ coordinates (—6, 7) is 


HPL 


Thus the formula X’ = L(X — py) does not apply to (1,3,0, 
(1,3,0, —2) does not lie in 7. 


L(X = po) | 


; that is, 


Notice that part (c) of Example 4.21 is just a typical problem from Section 2.3 
dressed up in new terminology (see Examples 2.25 and 2.26). 

Now let us restrict our attention to the case k =n. The matrix P is then 
square and invertible, and the coordinate-change formulas 


X = py + PX’ 
PX py) 


Apply to all points in R". The basic formulas developed in Section 4.2 for n = 2,3 


apply without change for any n. The proofs of the next two propositions are left as 
exercises, 


Proposition 4.19 Let 7:R" + R" be an affine transformation given by Y= 
T(X) = b + AN, Introduce X’ coordinates via the formula X = py + PX’. Then 
in the new coordinate system Tis given by 


Y'=c + BX’ 


where B = P-'AP and © = Pb + (A — 1)py). 


In particular if T'is linear (b = 0) and the coordinate change leaves the origin 
fixed (py = 0), we have 


=(P7AP)X" on 


44° Flats and Coordinate Systems in R” 263 


Propostrion 4.20 Let :R" > R be a quadratic function given by 
S(X) = 0 + BX + XTAN 
with A = AT_ 


Introduce X’ coordinates via the formula 
coordinate system fis given by 


X =p) + PX’. Then in the new 


(N= 0! + BX! + X TAIN" 


where c’ = f(py), BY =(B + 2phA)P, A! = PTAP = AT. 
In particular, if f is a quadratic form (c = 0, 


change leaves the origin fixed (p, = 0), then 


0) and the coordinate 


SX) =X PTAP)X’ ow 


EXxaMPLe 4.22 Let the linear transformation 7: R° — R® have matrix 


0 is -6 -7 Wl 

2 -9 6 3 =10 

Az=| 1 -3 1 a 1 
0-12 4 6 -8 

-4 2-0 0 WF 


Choose a new coordinate system for R® using 


Po = 9, py = (1,0, —2,0, —1), 
Po = (—2, 1,2, 1,0), py =, —1, 3,0, 3) 
Pa = (—3.2, 1,2, —2). py = (6, 5,0, —4,7) 


What is the matrix of T in the new coordinate system? 


Solution Since py = 0. the coordinate-change formula is just ¥ = PX’, with P 


given by 


° 
1-2 0-3 6 
tbe tet) 
eg 2 ie io 
0102-4 
Ra tOlesi 27 


and the matrix of T in the new coordinate system is PAP or 


264 Geometry and Coordinate Systems 


(As xP) BP 
10060 8, 756-18 1.01617 0.00E0 0.000 
-2,72E-18 2.0060 0.000 «0. 000-0. 000 
“8496-19 -1.256-17 3.0060 0.00800. 000 
0.0060 «2.51E-17 4.91617 4.0060 0.000 
299619 1.05E-17 2.53617 0.0060 5.0060 


Notice that in this new coordinate system T has a simple geometric description. 
‘The x; axis is stretched by the factor, 


EXAMPLE 4.23, Let i R® > R be 


OxF + 126x x + 64x 4X5 — yxy — 20K XG 
+ 120x3 + 42x yx, — 48xyx, + 36xyx, + 24xF 
= lrg, — 32xgtq + 10x} — Arges + Zag 


SUX Xx X39 Xp Xp) 


Find the expression for fin the new coordinate system for R® defined in Exam 
ple 4,22, 


Solution f(X) = XTAX, where A is given by 


A 
58 63 32-11-10 
69 120 21 -24 18 
a2 21 24 <5 -16 
"24-5 102 
oo 18) S16) me) eat 


Using the matrix P of Example 4.22, fis given in the new coordinate system 


by /(X) = X™(PTAP)X’. Computing PTAP, we have 
(UP) xAt xP 

1000 0.000 0.000 0.000 0 000 

0,000 2000 9.000 0.000 0 000 

0.000 9.000 3.000 0.000 0.000 

0.000 0.000 0.000 4.000 0.000 

© 000 0.000 0.000 0.000 § o00 

or 


Py x5. X5o 


ai? + xs? + 3x? + Axi? + Sx? 


Clearly fhas @ global minimum at X = X’ 


44° Flats and Coordinate Systems in RY 265 


EXERCISES 4.4 


In exercises | through S find the dimension of the flat defined by the given points. 
1. py =.=, -4, -2). p, = (4, 3, -2), ps = (3.0, -6. =). 
Ps = (3, -2, -8,1) 

2 pyo=(-L NS. py 
3. py =(-7, -9,0,0), 
Pa = (=11, ~10,4, =7) 


10,2, ~3), 


=4), py = (2, -2, 4, = 6), p, 6), 
(3.1, 11,5) 
8,1, ~6), py = (1, 2,-9.0. —7), 
=1,8, -3), py = (5.2, 21, = 12, — 11) 


In exercises 6 through 10 find the flat defined by the set of equations. Use Proposition 4.17 
to deduce the number of irredundant equations, 


6 We +2y + 102 = 18 Lee + 4x Hy 
3x4 2y +122 = 22 =Ny + Dry + 45 — xy 
xt y+ Sr= 9 


Ny = 23 — 4x — 6x, 
Hay + 3uy Tey + They 
x, — 4xy — 10x, — 16x, 
Dry + 6x) + 10x, 


10, x, - 2x, — 4xy+ xy 
Hy, +5 + Hy = 3K, 
~ Mt Rt Bt 

= 1x, + 2x, 

+ 6x5 


In exercises 11 through 15 find the intersection, if any, of the given flats, For questions of 
parallelism see exercises 21 through 24 of Exercises 4.5. 


V1. The 3-flat through (5, —7, —4, 9), (6, —6, ~6. 8), (3, —8, 1, 11), (1, =10,6, 12) and 
the line through (—15, ~20, 42, 35) and (—15, ~20, 43, 33). 


12, The plane through (—6, ~5, ~3, ~2), (—5, —4, =2, -2), (~7, =5, =4, -1) and 
the plane through (—18, —9, 9. 6), (—17, ~6, ~9.8), (= 19, —7, =10,.9), 
13. The 3-flat through (4, 4, 8, 0), (5.2, ~9. 6), (3, 7, —9, 5), (1, 11, =6, 3) and the line 


through (8, —5, ~12, 12), (21, —38, = 15, 27). 
14, The plane through (—2, 2, 5, 2.6), (=I, —3.5, 2.4), ( 
plane through (—1, —2, 3.3.6), (0, ~1, 0.3.6), (0, —5,7.0. 1). 


15, The 3-fiyt through (3.4.0.9), (4,3, 1, 10), (4.4, —1,9), (=1,6, 1,6) and the plane 
through (—5, 10, —3, 2). (0.7, —3, 6), (—10, 12, —2.0). 


1, =2,4,1,5) and the 


In exercises 16 through 21 a function is given along with a set of points defining a new 
coordinate system, Find the expression for the function in the new coordinate system 


266 © Geometry and Coordinate Systems 


1 4-20 
1-3 tt 
16. (Computer assignment) 7: R'—» RV is linear with matrix } > yyy 
1-8 32 
Po = 0. pry = (10,0, =), py = (=U Le 12). py =O. 12, Ds py = (112, 3) 
17, (Computer assignment) 7; R°— R® js linear with matrix 
8 -2 -3 -6 -7 
3 1-4 -5 3 
3-2 -1 -5 3 
=3 2 5 9 -3 
-3 2 2 3-2 
Po = 9% Py = (1,0,0,0, —1), pe = (eb =h =I py = (2,12, -2, =2), 
(0, —1, —1,2, Dy py = (2,1, = v 


18. (Computer assignment) 7: R! + R* by 


2 1 42 
; 4 12 Ce 
y= =! 5/4 |, ay vale? 

o} [+4 -4 -3) 


Po = (1,0, = 3,2) py = (2, 1, 4.2) py = (2,2, = 4,2), py = (3.3, 4. 
Py = (0-2, 3.4) 
19, f,R*—»R is the quadratic form 


Pop. Say Nye 8) = SUE + AN A, = Ory + Op, — BEE + Beye, 


4 Dah, 4.09 = aye, + 2 


Pus Pas Pr» Pa ate as in exercise 16 
20, f R is the quadratic form 


[ly Xap Sq Xp %s) = THF — By — 1g — IBEX, + MN 
+ UNE + Dye + 6xaey — Braxy + ISAE 


+ 26x, — Len + ITF = 1Bxyxy + BXE 


Pr Pay Pas Py Pao Ps Pe as in. eXercise 17, 
21, RR is the quadratic function 


PO Xo Xy hy) = 19 + 4x, + Wes + 22 + Dey + AF 
$A ee + ON Ny + Oxy + Ora, 
+ Dery + 828 + 1Oxyuy + 3x7 


Por Par Pas Pav Py AEE as in exercise 18. 
22, Let pq. qy be points in R", P,Q n-by-k matrices, and Ua variable vector in RE. 


45 Subspaces 267 


‘Suppose that py + PU 
and P=. 

23. Show that if Proposition 2,11 of Section 22 is taken to be the definition of matrix 
having “linearly independent columns,” then an n-by-0 matrix (e.g. 43 0 » 0) has “lin 
early independent columns.” Recall that (0 is a zero vector (it is the origin in R°) 

24. Let P be a matrix, X the row-echelon form of P with the rows of zeros deleted, and 
the vector of indices of the pivot columns. Show that P = P{: VR. 


+ QU for all U. By choosing special U's, show that py = 4, 


Hint: Proposition 3.1 and the definition of matrix-vector multiplication, 

25. Let be the flat given by p(T) = 

the flat =’ given by (5) = qy + PS. 
Hint: By Proposition 4.12 = is included in =, and it is sufficient to show that py is a 
point of = 

26. Prove Proposition 4.15. 


+ PT and let qy be a point of =. Show that ris 


Exercises 27 through 36 refer to Exercises 4.2 and 4.3. Redo the indicated exereises for R” 
with n> 3. If no change is necessary for n> 3, s0 state 


27. Exercise 39 of Exercises 4.2 28. Exercise 40 of Exercises 4.2 
29. Exercise 41 of Exercises 4.2 30. Exercise 42 of Exercises 4.2 
31. Exercise 43 of Exercises 4.2 32, f Exercises 4.2 
33. Exercise 45 of Exercises 4.2 34, Exercise 21 of Exercises 4.2 
35. Exercise 22 of Exercises 4.3 36. Exercise 23 of Exercises 4.3 
4.5 Subspaces 
Suppose that the Mat = is defined by the points py.py.-- spy. Putting 
P= [py — Pol---| Ph — Pok We have 

PT) = py + PT. (43) 


Most of the calculations in the previous section involved the matrix P — that is, 
the points p, — py. which do not lie in the flat 7. This practice started in Section 
4.1, where we noted that although vectors would be drawn at various positions in 
the plane and space, the tails were, in fact, fixed to the origin, This is illustrated in 
Figure 4.26, where 7 is a line in the plane or space. (See also Figures 4.3, 4.6, 4.11, 
4.13, and 4.14.) The vectors p, — Po... + +P — Po lie in the flat given by 


1) = PT (44) 
This flat contains the origin, 0. 
Derintrion 44 A subspace of R® is a flat containing the origin. 
First a bit of notation, If S is a subset of R", the image 7(S) of S under the 


translation T(X) = a + X will be denoted by a + S. The set a + S consists of all 
vectors v of the form v = a + sass varies in S. We write S — a for —a + S, The 


268 — Geometry and Coordinate Systems 


FIGURE 4.26 


flat defined in Equation (4.4) is thus + — p,. In R® and R¥ the flats defined by 
Equations (4,3) and (4.4) are parallel, and there is only one parallel flat through 
the origin with the same dimension as 7. 

We will now generalize these ideas to R". The basic fuet we will use is given 
in the next proposition, Notice that if'7 is a flat, then a + 7 is also a flat, In fact. if 
is given by Equation (4,3), then a + = is given by a + p(T) = (a + py) + PT. 


Proposition 4.21 Let be a flat in R”. If the flats a +7 and b += have any 
imersection at all, then they are equal. 


Proof First notice that if'a flat 7 is given by p() = py + PT as T varies, and if 
go is 4 point in 7, then the flat mr given by (8) = qo + PS is equal to =. We used 
this fact in Section 4.4 but left the proof as exercise 23 in Exercises 4.4. To prove it 
we use Proposition 4.12. Since qy is in = and P = PI, we have that =" is contained 
in =. To show that 7 is contained in 2’ it is sufficient to show that p, is contained 
in 7’, Since qy is contained in =, there is a parameter vector Ty such that qy = 
Po + PT, Hence 


Po = qo ~ PT = qo + M—Th) 


and 50 py is indeed in m* 

Now consider a+, given by (a +p,)+ PT; and b+= given by 
(6 + po) + PT: If q lies in both these flats, then, by the above reasoning. both are 
given byg + PT. ow 


Asan immediate consequence of Proposition 4.21 we have that any (wo trans- 
lates of a flat that contain the origin are equal. 


Derintrion 4,5 Let 7 be a fat in R*, The associated subspace of is the lat 
a — p, where p is any point of =. 

If is given by Equation (4.3), then the associated subspace is given by 
Equation (4.4). 

In R® and RS if parallel flats of the same dimension intersect, then they are 
equal. This is in accord with Proposition 4.21. If the flats are of differing dimen- 
sions, a line and a plane, say, then if they intersect, one is contained in the other. 


45 


ubspaces 269 


Dertnimios 4.6 Two flats in R* are said to be parallel if one of the associated 
subspaces is contained in the other. 

‘The associated subspace of = is the only subspace that is parallel to and has 
the same dimension as 7. 


The associated subspace of a point (0-flat) is the origin. Thus a point is paral- 
lel to all flats, 


In Section 4.4 we saw that affine transformations carry flats to flats. We can 
now say more. 


Proposition 4.22 Affine functions carry parallel flats to parallel flats, Linear 
functions are precisely those affine functions that carry subspaces to subspaces. 


Proof Let =, and =, be parallel and let S be the associated subsp: 
that S is given by Equation (44) and that 7, =a, + 
J(X) = b + AX is an affine function f R"—> R", then 


Slz)) = b + Ma, + PT) = (b + Aa,) + (AP)T 


as T’varies. Thus the associated subspace of 
1 and i = 2, The flats are thus parallel. If = 0, then /(S) = (AP)T, and so if 
Fis linear, it carries subspaces to subspaces. If, on the other hand, b 4 0, then 
J(0) = b £0, so the subspace (0) is carried to the O-flat {b}, which is not a 
subspace. 


“i 


given by s(1) = (AP) for both 


The next proposition gives the connection between subspaces and systems of 
linear equations. Recall (Proposition 4.17) that flats are simply solution sets for 
systems of linear equations, 


Prorosition 4.23 If the flat = is the set of solutions of the system of equations 
AX =b (45) 


then the associated subspace S is the set of solutions of the associated homogene- 
‘ous system 


AX =0 (4.6) 


That is, if Sis the set of solutions of (4.6) and p is any solution of (4.5), then 
p + Sis the set of solutions of (4.5). 


Proof Let S be the set of solutions of (4.6). Since ’ = 0) is « solution of (4.6). the 
flat S contains 0; that is, S is a subspace. If p is any solution of (4.5) and s is any 
vector in S, then A(p +5) = Ap + Av =b +0 =b, and so p + S consists of 
solutions of (4.5). On the other hand, if g is any solution of (4.5), then 
A(q — p) =b —b =0, and so s=q—p is in S and hence q =p +8 is in 
pts = 


210 Geometry and Coordinate Systems 


Exampre 4.24 Consider the flat x defined by the points py = (3,3, —9. 5), 
Px = (41, —10,5), ps = (2,6, —10,4), py = (0.10, =7,2). py = (1,8, ~8, 2), 
and the flat 7’ defined by go =(7,—6 —13, 11), g, = (20, —39, —16, 26), 
(6,27, -10, 4). 
Letting P have columns p, — py and Q have columns q, ~ qo, We have, solv 
ing py + PT =q + OS, 


Pe 
1.000 ~1.000 ~3 000 -2 000 

2.000 3.000 7.000 5.000 

1000 -1.000 2.000 1 000 

0.000 1.000 ~3.000 ~3 000 

o 

13-13 

33 33 

Scam 

15-18 
+E ECHELON P (0) .00-PO 
10060 = 2.7817 8000 100060 2 000 2.0060 ° 
0 00€0 + 00¢0 © 00e0 270617 3 0060 dove =o 
2 0080 2 TMe 7 1 080 + o0eo «0060 ' 
9.0060 ya9e17 0 0060 © 00e0 © 00e0 , 


Since the last column is a pivot column, the flats do not intersect. The col- 
umns of ~Q, and hence Q, are linear combinations of the columns of P; in fact 
~Q = Pt; 1 2 3JE[; 5 6} hence the subspace parallel to z’, which is given by 
s(T) = QT, is contained in the subspace parallel to c, which is given by 
(1) = PF: Thus the two flats are parallel. 


Notice that above isa 3-flat and "is a line, It ean be shown that a 3-flat and 
4 line in R must either intersect or be parallel (exercise 44) 

Now let us forget about general flats and concentrate on subspaces. 

A flat is defined by k + 1 points py. p;.. . . «Py. For subspaces, however, we 
may dispense with py, If is a subspace, then = — p, is another subspace parallel 
to 7, and so 7 = 7 — py. Thus we can always take = to be defined by vy = 0, 

1 — Por 1, — Po. When dealing with subspaces we shall always 
take vy = 0 and say that the subspace = is generated by Uy, Ua... « Uy. The set of 
vectors «Uy 18 called a generating set for the subspace that they (along with 
vy = 0) define, A linearly independent generating set is called a basis. Notice that 
the number of elements in a basis is the dimension of the subspace generated by 
the basis, 

If the subspace S is generated by u,,.. . .u,, then a parametric representa- 
tion of Sis all vectors 


7) 


PT 


GF and Phi] = 0, 


45° Subspaces 271 


Exampte 4.25 Find a basis of the subspace of R* consisting of the solutions of 
the system of linear equations 


Xp + Xp + Sky+ 9x, =0 
2x + ay + Try + Ixy =0 

2x, + 6xy + 10x, =0 
2x, + 4x, + 16x, + 28x, =0 


Solution Since the system AX = B is homogeneous (# = 0), we simply row- 
reduce A rather than [4 |B]; 


a 
fas bt, 

Pte aay 

mw 2 6 10 

2 4 16 28 

+E- ECHELON A 

1 00€0 © 00€0 2 0060 4 0060 
© 00€0 1, 0060 3 00£0 50060 
0 00€0 1.73618 0.0060 2.78617 
© 000 0.0060 © 00£0 1.39617 


and we have 


‘The nonpivot columns are El; 3 4], and so set.xy = fy xy = 


xy -2 -4 -2 -4 [0 
“ =3 -5 3 -s/ ly 
“al, 3 2 


hz) otal) eto, 


0 1 oot 


xy 
ad 
Thus the subspace is generated by v, =(—2, -3, 1,0), 


Since v4, 0, are linearly independent, they form a basi 
plane. 


EXAMPLE 4.26 What is the dimension of the subspace of R® generated by 
(1, =1, =2, 1,0), 0, 1, =1,0, 19, (2, 1, =7, 2, 3), (4, 1, = 13,4, 5)? 


Solution Since we have a subspace, yy = 0, and a parametric representation is 
s(T) = PT. where 


eee eer 
She sats 


272 Geometry and Coordinate Systems 


Row-reducing P, we get 


ECHELON P 
1.0060 8.67E-19 2.000 4000 
0. 00E0 1.000 3 00E0 5.0060 
0. 00E0 1.73618 § 20618 5 206-18 
B.67E-19 8 67E.19 2 .60E-18 ~2 60E-18 
0 00E0 B.67E19 3.4718 ~6 94-18 


The first two columns are the pivot columns. Thus (compare Example 4.17) 0), 0 
is a basis for the subspace generated by 0, Uy, Us Uy hence the subspace 
plane. 


Subspaces are characterized by special properties that often allow them to be 
identified in contexts somewhat removed from the purely geometric context we 
have been using. These properties are illustrated in Figure 4.27. 

If the points p,q lie in the flat z, then neither p +q nor sp need lie in = 
(Figure 4.27(a)}. If = is a subspace, however, then p + q and sp will lie in = 
[Figure 4.27(b)} 


Proposition 4.24 Let S be a set of vectors in R". The set S is a subspace of R" 
if and only if 

1, The zero vector is in S. 

2. fue are in S, then w+ v isin S. 

3. If vis in § and ¢ is a scalar, then 1v is in S. 


Note; (a) Conditions 2 and 3 are equivalent to the statement: If 
Wy yee etly are in Sand fy, toy... sty are scalars, then 


WH Nylty + lolly + tylly + + Eyl 


isin S. 


(b) In the presence of condition 3, condition | simply assures that the 
set S contains at least one vector. For if v is any vector in S and 


fe (b) 
FIGURE 4.27 


45° Subspaces 273 


condition 3 is true, then 0 = 0 x v is also in S. The empty set is not a 
subspace. 


Proof of Proposition 4.24 First assume that S$ is a subspace. Then condition 1 is 
true by definition. Suppose that v,,-.. . uy is.a basis for $ and Pl:i] = v,. Then S 
consists of all vectors o(T) = PT as T= [ty fay... fy]? varies. 

Now, if u = PT, and 


uto 
1) = IPT, = PUT) 


mT, + 


so statements 2 and 3 are true. 
Conversely, assume that statements 1, 2, and 3 are true, The set contains at 

Jeast one subspace — the O-flat consisting of the origin. Since Sis in R®, all flats 

involved have dimension < nr, so there is a k such that S contains a subspace = of 


dimension k and no subspaces of dimension k + 1 (0 <k <n). 
If is not all of S, then there isa vector v in § but not in, Let vy... . , 0, be 
a basis of = and let 7’ be the subspace generated by vj, Uy... . Uy, 0. Then the 


dimension of 2" is greater than k (Proposition 4.14), hence 2’ is not contained in 

'S. But if conditions 2 and 3 hold, then we can show that x’ must lie in S, and this 

contradiction shows that v does not exist — that 4s, 5 
A typical element of 2’ is 


lyst d= wei] where Pl:i] =u. Ti) 


PT + 
Sut 


where w is in 7 and hence in S. Thus by conditions 2 and 3, u + tv is in S and so 
s' isin . 


Any matrix A has three important associated subspaces. 


Derinition 4.7 Let A be an m-by-n matrix. 
1. The subspace of R" generated by the columns of A is called the column space 
of A. 

2. The subspace of R" consisting of all vectors v such that Av = 0 is called the 
null space of A. 

3. The subspace of R" genet 


ited by the rows of 4 is called the row space of A. 


Notice that the row space of 4 is the column space of AT, A fourth subspace, the 
null space of AT, is sometimes usefuul but does not have a special name. 

The row space and column space are subspaces by definition, The null space 
is just the set of solutions of AX = 0. This is a subspace by Proposition 4.23, It is 
instructive, however, to use Proposition 4.24 to verify these facts 


274 Geometry and Coordinate Systems 


EXAMPLE 4.27. Use Proposition 4.24 to verify that the row, column, and null 
spaces are subspaces. 


Solution Since the row space of A is the column space of AT, it is sufficient to 
consider null spaces and column spaces. 

‘Null space: Since AO = 0, condition | is satisfied. If Au = 0 and Av = 0, then 
Alu + &) = Au + Av =0 + 0 = 0, and so condition 2 is satisfied. If ris a scalar 
and Av = 0, then A(tv) = (Av = +0 = 0, and so condition 3 is satisfied. 

Column space: This was done in the Proof of Proposition 4.24. A vector v is 
in the column space if and only if v = AT for some vector T- Thus, 0 = AO shows 
that condition 1 is satisfied. If w= AT, and = AT, then u + 0 = A(T, + T,), 
and so condition 2 is satisfied, and 1 = A((T,), and so condition 3 is 
satisfied, = 


In Chapter 3 (Proposition 3.6) it was stated that the matrices 4 and AT have 
the same rank. We can use the concept of row space and column space to prove 
this, First we prove a preliminary result. 


Proposition 4.25 Let A be m-by-n 
1. Let C be n-by-n and invertible, Then A and AC have the same column space. 
2. Let B be m-by-m and invertible, Then A and BA have the same row space. 


Proof Statement 2 follows from statement 1 by taking transposes (exercise 20). 
The column space of AC is contained in the column space of A by Proposition 4.12 
and, since A = (AC)C", the reverse inclusion holds as well. 


Now we are ready for the main result 
Proposition 4,26 Let A be a matrix. Then 

dim row space A = rank A = dim column space A 
In particular, rank A = rank AT. 


ie) Let G = GAUSS A. Then Gis invertible and £ = GA is the echelon form 
of A. 

Now the dimension of the column space of 4 is the rank of A (Proposition 
4.13) and hence is the number of pivot columns, which is the number of leading 
V's in E, which in turn is'the number of nonzero rows of E. Since the leading I's 
are the only nonzero entries of their column, it follows (see below) that the 
nonzero rows of E are linearly independent, But since E = GA, A and E have the 
same row space by Proposition 4.25. 


It is clear that the nonzero rows of an echelon form E are linearly indepen- 
dent from the pattern of zeros and ones involved. We shall need similar observa- 
tions several times below. The next proposition formalizes the observation. 


45 Subspaces 278 


Proposition 4.27 Let A be a matrix and suppose that there is a vector of row 
indices ¥ such that A[¥/;] = ID p V. Then A has linearly independent columns, 


Proof Suppose that AX = 0, X a vector, Then 


0=(A+.X XV] 


= AlN] +. X 
=(IDpV)+.%X 
x 


So by Proposition 2.11 4 has linearly independent columns. 


To apply Proposition 4.27 to the nonzero rows of an echelon form E, suppose 
that the nonzero rows are £[W;] and the pivot columns are £l;VJ. Then 
EIW; R) = ID p V, so take A = E[WGI, 

The dimensions of the null space and column space are also related. 


Proposition 4.28 Let A be a matrix 


dim (column space 4) + dim (null space 4) = number of columns of A 


Proof The null space is the set of solutions of AX’ = 0, To prove the proposition, 
we will just write down a formal description of our solution method, We tke 
E = ECHELON A and write down the solutions of EX = 0 as follows: Let E[;¥/] 
be the nonpivot columns of E (if there are none, then Vis 0). The entries of X{H'] 
may be chosen arbitrarily. Say X[V] = 7: The other coordinates of X may then be 
written in terms of 7, The solutions are then expressed as X = PT; where each 
column of P is-derived from a nonpivot column (see exercise 40 and the next 
example). Since T= X{V] = (PT)|V] = PUVSIT for all choices of T, P(V] = 
1D pV and the columns of P are linearly independent by Proposition 4.27 
Thus the dimension of the null space is the number of nonpivot columns. 


dim (column space A) + dim (null space A) 
= number of pivot columns + number of nonpivot columns 


= number of columns . 


Exampce 4.28 Find bases of the row space, column space, and null space of 


A 
4 “pes¥ 2 ay 
ANY We ants 
a) Ad) rapcieess! 
a Xo, fg) on se 


216 Geometry and Coordinate Systems 


Solution 


+ECECHELON A 


1000 0 0060 2.0060 0 00E0 4 00€0 
1.73618 1.0060 3 0060 1.73618 5 00£0 
0 00E0 0 00€0 0 0060 1.00£0 6.000 
0 0060 1.73E-18 3.47E-18 0. 00£0 0.000 


The pivot columns are £[; 1 2 4}; thus a basis of the column space is 
Af; 12 4) or (1, =2, —1, 1), (=1,3,0,0), (2, 3,0, = 1). 

A basis of the row space is E[:3;] or (1,0, 2, 0,4), (0, 1.3, 0, 5), (0,0, 0, 1,6). 
Notice that £[3; 1 2 4] is 1D 3, as it should be, 

To get a basis of the null space we use the nonpivot columns £]; 3 5}. We let 
}y be arbitrary, and then 


2, — Aly x -2 4) [-2 -4 4] 
= 3, — Sty % 3 -s} |-3 -s}le 
or X=}ujf=e] +e] of=] 1 0] ser 
0 -6 
0 


So a basis of the null space is (—2, —3, 1,0,0), (—4, -5, 0, —6, 1). Notice that 
P[3 5;] = 1D 2, as it should. . 


EXERCISES 4.5 


In exercises I through 5, points py. py. - - «Py defining & k-Mlat in R” are given. Is the K-ftat 
‘a subspace? If so, give a basis of the subspace. 


Py =U 1, =2, <0). py = (2.0, 
5. py = (13, 10, — 1-13), py 
Ps = (14, 11, —37, 0, =18), py = (11, 8, ~30, -2, <1), py = 11,9, =30, =1, = 10) 


The sets 5 given in exercises 6 through 10 are not subspaces, because they are not flats. 
Which of the conditions of Proposition 4.24 are violated? Give an example of each viola 
tion, For example, if condition 3 fails, give a vin S and a scalar 1 such that fv is not in S. 
6, All points in space except the origin. 


7. All points (x.y) in the plane such that x and y° are integers (positive, negative, or 
2610), 


8. All points in the plane in the first quadrant, axes included. 
9, All points in the plane in the first quadrant, axes excluded, 
10, All points in the plane in the first and third quadrants, axes included. 


45° Subspaces. 277 


I Let = be the flat defined by py. py. -- «7, 
(a) Show that if there is a single vector v in = and a single scalar’ ¢ 1 such that 0 
is in x, then = is a subspace. 

Hint: From v = py + PT, and w = 
tion. 


+ PT show 0 = py + PT has a solu- 


(b) Show that if there are two vectors uw in = such that uw + v is in 7, then = is 
a subspace. 


In exercises 12 through 19 a matrix A is given, Find bases of the row space of A, the 
column space of A, and the null space of A. What is the dimension of the null space at A7? 


12. [: es | 13. [; 2 -10 ] a i 
amie =z t See, Hac ogl Oa a) 
2 Teer | Te Rees: [: 04 | 
eee ete [Sop eay as In [8 16 2 6 
[: 4 3) | Zea) Seg Pere Siac) 
Oar a tre iets 5 ese cathy 
eal eae! Bo Te (0) 3 
18 [l4 2 77-7 TN) eect Baas 
a 76 ii farts shal) 
42877 7 fee sO) Len 
alg, oS) 3, eT Bert 


20, Assuming the first statement of Proposition 4.25, prove the second statement. Show 
that 4 and BA have the same null space ax well 
21. Do the flats of exercixe 12 in Exercises 4.4 intersect? Are they parallel? 
22. Do the flats of exercise 13 in Exercises 4.4 intersect? Are they parallel? 
23. Do the flats of exercise 14 in Exercises 4.4 intersect? Are 
24, Do the flats of exercise 15 in Exercises 4.4 intersect? Are 
25, Let A be an m-by-n matrix of rank r: 
(a) Let H = (r,0)4 GAUSS A. Show that the columns of H" are a basis of the null 
space of AT 
Hint: Show that 14 = 0, H? has independent columns, and apply Propositions 
4.26 and 4.28, 
(b) Show that the columns of (7.0) £ GAUSS A?? are a basis of the null space of 
A. 


Hint: Apply (a) to 47 


(Computer assignment) In exercises 26 through 33 use the result of exercise 25(b) 10 
complete the null space of the given matrix. 


26. The matrix of exercise 12 27. The matrix of exercise 13 
28, The matrix of exercise 14 29. The matrix of exercise 15 
30. The matrix of exercise 16 31 The matrix of exercise 17 
32. The matrix of exercise 18 33. The matrix of exercise 19 


278 Geometry and Coordinate Systems 


34. Let F:R" > R" be a linear transformation. Let S be a subspace of R™ and let F-4S) 
be the set of vectors vin R such that F(o) lies in S. Show that F~(S) is a subspace of R™ 


Hint: Use Proposition 4.24, 
In exercises 35 through 37 the matrix 4 of a linear transformation FR" — Ris given and 
a subspace § of R” is given. Find a basis of F-YS). 
Hint: If S is the set of vectors (7) = BT, then F 
which there exists a vector Y such that AX 
[A| BLK] YP =0. 


S) is the set of vectors X for 
BY —that is, for which 


1 I 5 

35. [2 =! =i} sem (3.8, 10 
-I 0 -2 

1 =1 =f 

0 1 3 

=2: z 2 


. S generated by (1,0, 1), (—S, ~5,4) 


-1 -2 =8 
37, A= | 2 2 10), S generated by (=1.1, =!) 
ay ih A 


In exercises 38 through 57 define the APL function described, 


38. Name: NBASE 
Right argument: A matrix £ in echelon form 
Left argument: The vector ¥ of indices of pivot columns of E 
Result ‘A basis of the null space of E 
Suggestion: 101001 1 23i81 02003 
pand the number of rows of a muitrix A, use 4 instead of 

39, Name PIVS 
Right argument: A matrix & in echelon form 
Left argument: A scalar M. The component B = Eli; /] is assumed to be zero if 

MaM+B (M sels the seale for fuzzy comparisons.) 


is called expand. To ex- 


Result; The indices of the pivot columns of E 
40. Name: NULLSPACE 
Right argument: A matrix 
Result ‘A matrix B whose columns are a basis of the null space of 
(Note: You may assume that the functions of exercises 38 and 39 
are defined. 
41. Name COLSPACE 
Right argument: A matrix 
Result A basis of the column space of A 


Note: You may assume that the function P/VS of exercise 39 is defined. 
42, Name: CUTSPACE 
Right argument: A matrix B 
Left argument: A matrix A with (pB){1] = (pA)L] 
Result: A basis of the intersection of the column space of A with the col- 
umn space of B 


45° Subspaces 279 


Note: You may assume that the functions COLSPACE of exercise 41 and NULL- 
SPACE of exercise 40 are defined. 
43. Write a different version of NULLSPACE (exercise 40) based on exercise 25, 
44, Let 1 be a hyperplane and = a flatin R". Show that if H and « do not intersect, then 
they must be parallel. 
Hint: Let H be given by p(T) = py + PT and let x be given by gS) = q, + OS. 
where P has rank n — | and the rank of Q equals the dimension of 7, Consider the 
possible placement of pivot columns in [P| -Q|qa — pol 


CHAPTER FIVE 


Orthogonality 


In this chapter we introduce the concepts of distance and angle for vectors in R" 
and define orthonormal coordinate systems. These coordinate systems have mutu- 
ally perpendicular axes and are extremely important in applications. 

The concepts of “distance” and “angle” are defined in Section 5.1 via the dot 
product of two vectors. Distance-preserving mappings and coordinate changes are 
also discussed. 

In Section 5.2 we show that any symmetric matrix may be diagonalized by a 
distance-preserving coordinate change. This introduces the subject of eigenval- 
ues and eigenvectors for symmetric matrices. Eigenvalues of general matrices are 
taken up in Chapter 7. The diagonalization of a symmetric matrix is a fundamen 
tal technique in optimization (the second-derivative test, Section 5.3"), physics 
(Section 5,6"), and statistics 

In Section 5.4 we develop formulas for perpendicular projections and reflec 
tions and relate them to the least-squares calculations of Chapter 1. 

In Section 5.5 the Householder algorithm is developed and used to automate 
the procedures of Section 5.4, Section 5.5 also contains a function BACKSUB that 
Automates the process of writing down the solutions of a linear system, once the 
echelon form is obtained. H 


5.1 Distance and Angle 


In this section the concepts of “distance” and “angle” in the plane and space will 
be extended to higher-dimensional spaces. The idea is to express “distance” and 
“angle” in purely algebraic terms and then use the algebraic formulations as 
definitions in higher dimensions. 


The algebraic formulation is based on the dot product of two vectors. 


Derinttion 3.1 Let v and w be vectors in R" The dot product of v and w is 
v +, w, Alternate notations are v + w and, if v and w are considered to be col- 
umn vectors, o7w = wv, 


280 


SI Distance and Angle 281 


fa) (b) 
FIGURE 5.1 


To write the distance between two points in terms of the dot product we start 
with the Pythagorean theorem, which states that if the lengths of the two legs of a 
right triangle are a and b and if the length of the hypotenuse is c, then a? + 
[Figure 5.1(a)}. Applying this result to the vector v = (a, 6) in the plane, we have 
(length v)* = a® + 6* (Figure 5.1(b)}. Applying the result twice 1o the vector v 


(a,b,c) in space, we have (length vj? = ( Va? + 6)? + c? =a? + bY + ct 
[Figure 5.1(c)} 

Note that if v = (0), Bp Uy .-+50,) then 00 = UP + UF fo + OF 
Derinimion 5.2. Let o be a vector in R. The length or norm of v is \ju|] = Vo 


To define the distance between two points p and q we note that it should be 
the length of q — p (Figure 5.2). 


Derinition 5. 


The distance d(p,q) between p and q is dp.q) = \\q — pl = 


To relate angles to the dot product we use the law of cosines from trigonome- 
try. Recall that the law of cosines extends the Pythagorean theorem to triangles 
that are not right triangles by adding a correction factor: 


a 


a® +b? — Jab cos é 


where @ is the angle opposite the side of length ¢ (Figure 5.3), If @ = 7/2, then 
cos = 0, and we have the Pythagorean theorem. The law of cosines is easily 


a-P 
q| 
y 
0 
FIGURE 5.2 


282 Orthogonality 


\ 


\asino 


\ 
\ 
@de—- 


FIGURE 5:3 


derived from the Pythagorean theorem. By dropping the perpendicular indicated 
in Figure 5.3 and using some elementary trigonometry (Exercise 50), one finds 


(asin 0)" + (b — a cos dy 


which simplifies to the law of cosines. 
Now refer again to Figure 5.2. According to the law of cosines 


lla — pit = ligi* + Wel® — 2NigIl Nip|l cos 8 


where @ is the angle between the vectors p and q. When we write p and q as 
column vectors, this equation becomes 
(q ~ py"(q — p) = 4"4 + PT — 2\9)\ \p|| cos 8 
gq — 4p — pq + pip = 474 + pip — 2igl|\ipli cos 


or 
—2p-q = —21\pll \igil cos 
since 
PG =p-q=4q"'p 
‘Thus 


Pq = IIPll igi cos 


If |ipl| gi] #0, this gives 


cost (P24 
ta 


This equation expresses the angle between two vectors in the plane or in space in 
terms of dot products. In particular the vectors p and q are perpendicular if and only 
fp: =0. 

We would like to use the formula 


pee 
lll tail 


S1 Distance and Angle 283 


to define the angle between the vectors p and q in R*. There is a problem that 
must be overcome first, Since —1 < cos < | for any angle 0, the same must be 
true for (p+ q)/'\p|\\q\) or the definition will be nonsense. 


The next proposition collects this fact and several other useful results together 
for reference. 


Proposition 5.1. Let v and w be vectors in R*. 
1. {jol] > 0 and jjo] = 0 if and only if v = 0. 

2. jlavi| = fal ol). « any scalar. 

3. [ow] < jjo}| wl) (Schwartz inequality). 

4, {jv + wl] < jul) + lm) (triangle inequality). 


Proof We leave statements 1 and 2 as Exercises $1 and 52. 
To prove the Schwartz inequality we start with the fact that for any scalar x, 
0< Iv — xwIF, 


Ax) = |v — xw |? 
= (0 = xw)"u = xw) 
Pw 


= vty — vot + 
= |fwytx® — (u-wyw + foil? 
20 


By the quadratic formula, f(x) = 0 when (if |jwi) = 0 there is nothing to prove) 


ve 


Il 


Since /has at most one real root, (v + w)? — jjuj/jw]|? <0. Statement 3 fol- 
lows. 
To prove the triangle inequality, 


Je + WF = +) +) 
= ull? + 2e-w + wl? 
< |e? + 2We)) wl) + fey? by the Schwartz inequality 
= (ell + Ie? 


Thus jjv + || < lel] + [we 


The triangle inequality 
‘one side of a triangle is al 
‘other two sides (Figure 5.4). 


an algebraic version of the fact that the length, of 
less than or equal to the sum of the lengths of the 


284 Orthogonality 


FIGURE S.4 


Derinrrion 5:4 Let v, w be vectors in R®. 
1. The vectors v and w are perpendicular, written v Lw, if vs w = 0. 
2. The angle between v and w is 


soe (uo™ vf wl #0 
cos) ited imi 4 
undefined if jell wi] =O 


O= 


Notice that the zero vector is the only one perpendicular to all vectors or perpen 
dicular to itself. 


EXAMPLE 5.1 What is the angle between the line through (1, 2) and (—3, 4) and 
the line through (—1, 2) and (1, 1)? 


Solution Let py = (1,2), py =(—3. 4). do = (1,2), and g, = (1, 1). Then the 
vector v =p, — py =(—4,2) is parallel to the first line and the vector 
W =, — qo = 2, —1) is parallel to the second line. The question is slightly am- 
biguous, since there are two supplementary angles involved. One angle is ob- 
tained using v and w or —v and —w and the other is obtained using —v and w 
or v and —w. The two angles will be 


eels 
Paco (a 


The acute angle is given by cos! (|v + w{/|ju|l  |jw|)), and we will take this as the 
solution. 


5 Distance and Angle 288 


vena 
Wee 1 
SAAD 20| Ve xWa( (Ve #2 )xWe #2)0-2 
0.219 
+DEG-RAD-0-180 
125 


Thus the angle is approximately .22 radians or 12.5 degrees, 
Exampte 5.2. Find a vector perpendicular to (1, 2, 3) in R°. 


Solution The easy answer, of course, is v = 0, since the zero vector is perpendicu- 
lar to all vectors — even itself. To find a nonzero vector perpendicular to (1, 2, 3) 
we proceed as follows. 

Let w = (x,y,z) be perpendicular to » = (1, 2,3). Then 


O=verw=x +2432 


Thus the set of vectors perpendicular to (1, 2,3) is the plane x + 2y + 
Since 0 1 v, this plane is a subspace, the null space of the matrix [1 2.3}, which is 
already in echelon form. Si 


w=lyl =a] tee 
z 0 1 


and any choice of (1,. 1.) —for example, (1, 0) or (0, 1) — will give a vector per~ 
pendicular to (1,2,3). 


EXAMPLE 5.3 Let the triangle ABC be isose 
Show that angle ABC equals angle ACB. 


es with side AB equal to side AC. 


A-Bv= 


Soluion Let u >= A. Then |ju|i = |jolj and C= B =u +0. 10 


f, = ZABC and @, = ZACB, then 
ue(ute) _wru ture _ lui +ure 
cos 0, = = = 
Heel ae ol] Weel e+ el] feel) ee + elf 
wi? +urw ere ypury usu te) 
© Tor e+ of Tell te + elf Well ee + elf 
= cos 0, 


8 wee € . 


286 Orthogonality 


5,3), 


Exampe 5.4 Is the triangle in R# with vertices p, = (2, 0.2.0). ps 
and p; = (9, —1,9, —l) a right triangle? 


Solution Let u = 


n 


Ps 
rs res 


Then py —py =u +0 = (4, —4,4, —4). The dot products among u,v, and 
+ u may all be calculated at once by forming A = |u|v|u + e] and computing 
ATA, This is because 


(ATANG j) = ANG] + x AG 
AL) +x ALA 
AL AGA 


it 


36.00 36.00 0.00 
36.00 100.00 64.00 
0,00 64.00 64.00 


Note that A[;1] + 4[:3] =u+(u 4 v) is zero, hence w L w+ & — that is, side BC is 
perpendicular to side AC. 


In the example above, (ATA): i] = [Alsi]? and 64 + 36 = 100; that is, the 
triangle satisfies the Pythagorean theorem. The next proposition shows that this 
gives an alternate proof that the triangle is a right triangle. 


Proposition 5.2. (Pythagorean theorem) Let u, v be vectors in R". Then w 1 & 
if and only if 


fle + oll = jul? + oy 


51 Distance and Angle 287 


lu + ol]? = (w + wu + v) = Ulu + vu + wT + OTW 
= |lull? + lel? + 2uew . 


The trick used in Example 5.4 for computing the dot products is quite impor- 
tant. To get information about the lengths and angles among a set of vectors one 
stores the vectors as columns of a matrix A and forms the symmetric matrix 
ATA. The i:j entry of ATA is the dot product of A[;i] and A[j/} If this entry is zero, 
then 4[;i] 1 Aljj]; if positive, then the angle between Af;i] and A[;/} is acute; and 
iffnegative, then the angle between Af;/] and A[;/] is obtuse. The i: entry of ATA is 
{\Al;i}|2. To obtain the cosine of the angle between A[;i] and Aly/} divide (A"A) 
by |/A[sill) Al;/l\|- The quickest way to compute thi 


Pede(1 1BBe(BA)4 KA) +2 
cos. 


“J 


The angles between vectors in high-dimensional spaces are often used in 
statistical analyses, although they are not usually thought of as such, 


THE CORRELATION COEFFICIENT 


Let x), Xp... % be the measurements of one characteristic (e.p., height in 
inches) of the m individuals in a population and let y, Jy be the measure- 
ments of another characteristic of the population (e.g., weight in pounds), Let be 
the average of the x’s and y the average of the y’s. The correlation coefficient is 
usually defined as 


and is considered to be a measure of the “relatedness” of the two measurements, 

The formula for r,, shows that it is cos for some angle @ in R" — but what 
angle? 

Recall (Section 1.3) that, owing to the often arbitrary nature of statistical 
scores, theasurements are often reduced to standard scores or z-scores before 
being analyzed statistically. The vector of standard scores for the x measurements 
is z, =(X — ¥) +0, and that for the y measurements is 
where X{i = x,. Yli] =». 0, is the standard deviation of X, and 9, is the stan- 


dard deviation of ¥. 


288 Orthogonality 


Notice that 


va=[%=2| 
1 


vin 


|X =|) by Proposition 5.1, statement 2 


X — XI] 


1X 
= Vn 


Similarly, |jz)|] = Va. In fact, all 
only in direction 
The cosine of the angle between =, a 


scores have the same length in R® and so differ 


Thus the correlation coefficient is the cosine of the angle in R" between the 2- 
scores, The measurements X and Y are perfectly correlated when z, = =z, and 
uncorrelated when 2, 1 2, 

When correlations are viewed as angles, certain properties become geometri- 
cally evident (see exercise 53). 

The next proposition gives some useful results involving matrices and dot 
products, 


Proposition 5.3. Let v and w be vectors in R" and let A be m by n. 
1 vs (Aw) = (4%) 0 

2. If (Av) > w = 0+ (Aw) forall v and w in R®, then A is symmetric. 
3. If(Av) + (Aw) 


Proof 
1. Writing v and w as column vectors gives 


p+ (Aw) = vTAw = (ATo)Tw 


vw forall v and win R", then A is invertible and A~! = AT. 


51 Distance and Angle 289 


2. Notice that if v = (/D n)[:i] and w = (1D m)fyj}, then 
(Aw) = vTAly] = Ales] 


So Ali; j] = v + (Aw) = (Av) + w = w+ (Av) = Af js if; that is, A = AT. 

3. y= w = (Av) + (Aw) = v- (ATAw)) by statement 1. Now take v and w asin the 
proof of statement 2 and set J = 1D n. Ili; j] = 0+ Iw =u+w =v+(ATAW) = 
(ATA)isj| So ATA =I 


Derinirion 5.5 A matrix is orthogonal if AT is an inverse for A, 
For example, the rotation matrix 


cos sin 
R= 
le a 


is orthogonal. Notice that a rotation of the plane does not change the distances 
between points or alter the angles between lines. 


Derintrion 5.6 Let /:R" + R" be a function 

1 We say that /preserves dot products if v * w = f(v) + f(w) for all pairs of vectors 
v,w in RY, 

2. We say that f preserves distances, or that it is an isomerry or & congruence, if 
d(f(p). f(g) = d(p.q) for all points p. q in R®. 

3. We say that / preserves angles, ot is a similarity, if given three points p 
in R the angle between p, — py and py — py is the same as the angle between 
S(py) = f(po) and fps) — f(po) (Figure 5.5). 


D 


A similarity need not preserve distances (exercise 44), but it ean be shown that 
an isometry necessarily preserves angles. The next proposition shows this for 
affine transformations. 


Proposition 5.4 Let ¥=/(X) = B + AX be an affine transformation from R” 
to R*, 

1. If Ais orthogonal, then / preserves distances and angles. 

2. If fis an isometry, then 4 is orthogonal. 


1p) 


similarity 


FIGURE SS 


290 Orthogonality 


Proof Statement 1 is exercise 39. We proceed to statement 2. Let p and q be 
points in R. 


A fp), SQ? = Wp) — fg|i® = |B + Ap — (B+ Ag)|? 
Alp — 9) (A(p — 9))*(A(p — @)) 
=(P—9)-(p-—9) 


Ap +g) = |p = ail 


In particular, if we take p = 0 and q =v, we have 


v+v = (Ad) *(Av) 
for every vector v in R". If w is second veetor in R®, then 
[AW + 9] [Ale + wo] = (Ae) (Ae) + (Av) (Aw) + (Aw) (Aw) 


Hence for every pair of vectors v, w in R® 


(Av) + (Aw) = A{LA(Y + w)] + [Av + 999] = (Ab) (Av) = (Aw) = (A) 
[Ce + w) (e+ we) = | 


‘Thus A is orthogonal by Proposition 5.3.» 


EXAMPLE 5.5 Which of the following affine functions are isometries? 
1. RY RP is a translation. 

2, f:R? + R¥ is a 30-degree rotation clockwise. 

3. RS > Ris a reflection in the xy plane 

4. f/R" + R" is ¥ = /(X) = 3N. 

5. {RY Ris 


+X 
<a} 
rahe) 
Ry Hy 


Solution 


A translation is of the form /(X) = B + X —thatis, A = 1D n— so certainly 
AT and fis an isomet 


1 
A 
2. Y=/(X) = R,NX with @ = —7/6, Since Ry? = 
3. A reflection in the xy plane takes (x,y.2) to (x) 


7, this is an isometry. 
), so 


5.1 Distance and Angle 291 


10 0 
y=f(Xy=|0 1 olx 
00 -1 
10 o}T1 0 oO] [1 0 oVf1 0 o 
Oo 1 offo 1 of=jo1 offor o 
00 -1}l0 0 -1) loo -1 O =1 
0 
1 
0 


1 
=|0 
0 


300 
030 
003 


Hence is an isometry. 
4. In this case ¥ = AX with 


A =i 


and hence ATA = 91 # 1. Thus fis not an isometry. (fis a similarity — see Exer- 
45.) 


and 


0 4 0 i) 
=} =! 
‘lo 0 4 0 
0 0 0 4 


Thus fis an isometry. = 


Although distance is defined in terms of the dot product, a function need not 
preserve dot products in order to preserve distances, In fact any translation is 
an isometry, but no nontrivial translation will preserve dot products, If 


292 Orthogonality 


[(X) =a +X, aa vector in R", then 


(a +0)-(a +) 
=aratar(u+w)+uew 


flv) = few) 


If we restrict our attention to linear transformations, however, the concepts coin- 
cide. 


Proposition 5.5. Let ¥ = (X) = AX be a linear transformation from R® to R", 
‘Then the following statements are equivalent. 


1. The function / preserves dot products. 
2, The function fis an isometry, 
3. The matrix A is orthogonal. 


Proof Since distance is defined in terms of the dot product, statement | implies 
statement 2. Statement 2 implies statement 3 by Proposition 5.4. Suppose that the 
matrix A is orthogonal. Then by Proposition 5.3, statement I, 


(Av) + (Aw) = (ATA) w= UW 


so statement 3 implies statement |, 


COORDINATE CHANGES 


Next we wish to discuss the way in which a coordinate change affects the formulas 
for distance and angle, Because of the way in which translations interfere with 
the dot product (see the discussion preceding Proposition 5.5 above), we begin by 
considering coordinate changes in which the origin remains fixed. 


Derinrnion 5.7 


1. An affine coordinate change X= py + PX*is called linear if p, = 0. That is, 
the new origin coincides with the old origin, 


2. A linear coordinate change is said to preserve dot products if X' Y 
for all vectors of coordinates X, ¥ 


The next proposition is immediate from Proposition 5.3, statement 3. 


=X-Y 


Proposition 5.6 Let X’= PX be a linear coordinate change in R" 
‘Then X'TY’ = X7(PTP) y. In particular the coordinate change preserves dot prod- 
ucts if and only if P is orthogonal, 


The requirement that P be orthogonal restricts the possible coordinate 
changes a great deal. Suppose that the new coordinate system is defined by the 
Points po, Py, . «. Pq. We are assuming that py = 0, so PI 
the vectors py, Pa. .- » Py form a basis of R" (Figure 5.6). 


Distance and Angle 293 


der 
z 
a 
. x 
/ 
FIGURE 5.6 
Now 
Pip) = PLA PU 
= (AAPL) 
= PNG) 
= (ID ne: /) 
Sop, 1 p, when i 4 jand p,«p, = 1 — thats, ip,\j = 1. Thus: The distance and 


angle formulas remain the same when the new axes are mutually perpendicular and 
there are no scale changes, 

There is a bit of (somewhat confusing) terminology associated with this situa- 
tion. When applied to vectors (not matrices!) the term “orthogonal” is a synonym 
for “perpendicular.” A unit vector is a vector v with |joj] = 1. It is important to 
note that (1/|j0})) © is always a unit vector pointing in the same direction as ¢, 

A set of vectors 0), ..., Uy is called orthonormal if v, 1 v, for i #/ and 
(jo) = 1. Thus a marrix is orthogonal if and only if its columns are an 
orthonormal set of vectors. 

We will refer to a coordinate system as orthonormal if the axes are mutually 
perpendicular and there are no seale changes. 


The orthonormal linear coordinate systems in the plane are not 
be (see Figure 5,6). First rotate the x and y-axes counterclockwise 
until the positive x axis is rotated into the positive x’ axis; say that a rotation 
through @ is necessary. This rotation gives an intermediate coordinate system 


ff] where 
j 


Now ¥ =x" and, since the x’ and y’ axes are perpendicular, the F axis coincides 
with the’p’ axis. The positive directions on the y’ and 
however. If they do not, then 

li Mk 

o 1s 


axes may not coincide. 


294 Orthogonality 


Thus the linear orthonormal coordinate changes are given by Y = PX’, 
where 


Ry if xy’ is a right-handed system? 


if x'y’ is a left-handed system 


where 
O<d<27 6 


Now suppose we have a coordinate change X = 4 + PX’. where Pis orthogo- 
nal, IfX and ¥ are vectors of coordinates, then N+ ¥ #.N’+ ¥', but this is not 
important because the formulas for distance and angle are unchanged. That is, we 
have 

AX, Y) = X= Yl = |X = Y= axe 


because the constant term q cancels from the difference 


x- 


P(X’ = Y') 


‘and so the calculation reduces to the linear case treated above. The same is true 
for the formula for cos, since the vectors used in calculating the angle are differ- 
ences (see Examples 5.1, 5.3, and 5.4). 

We close this section with an application to Euclidean plane geometry. We 
have said that an alternate term for isometry is “congruence.” If this terminology 
is not arbitrary, then two objects in R? should be “congruent” if there is a congru- 
ene that maps one onto the other. 

In secondary school geometry courses, on the other hand, one often says that 
(Wo figures are congruent if one can be picked up and placed precisely on top of 
the other, It may be necessary to turn one figure over before the two can be made 
to match (Figure 5.7). 


(a) (b) (3) 


FIGURE 57 Three congruent figures in R2. Figure (b), may be made to coincide with (a) by sliding 
and rotating; (¢) must be flipped aver as well 


1 That is, if the 90° rotation from the positive x” aXis Lo the positive y" axis is counterclockwise 


31 Distance and Angle 295 


FIGURE 5.8 


The next proposition shows, among other things, that the wo concepts of 
congruence coincide. Notice that one way of turning over a figure in the plane is 
to reflect it in a line —any line (exercise 43). 


Proposition 5.7 A congruence in the plane may be factored as a composition of 
at most three transformations: a reflection in a line, a rotation, and a translation, 


Proof Let f-R® — R® be a congruence. Notice that we do not assume that / is 
affine, so we cannot immediately apply Proposition 5.4. We first show that fis 
affine. 

Let b = /(0) and let g(x) = f(x) — b. Since fis a congruence and translation 
is a congruence, it follows that g is also a congruence (exercise 41), We will use 
Proposition 2.17 to show that g is linear, 

Let v, w be two vectors in R#, Let us distinguish in this instance between the 
vectors v, wand the points p, at their tips. Let r be the tip of u + ww; see Figure 
3.8, 

The function g carries the points 0, p, g. r 10.0 = g(0). gp). g(q). g(7). Since 
do, q) = dp, r), it follows that do, 9(9)) = dg(p). gr), and since dio, p) = 
dq, r), it follows that d(o, ep) = dig(q). g(7)). Thus 0, g(p), a(q), g(r) form a par- 
allelogram, and hence g(v + w) = g(v) + g(). 

Next we wish to show that g(av) = ag(v). Let s be the endpoint of av. Now 
the points 0, p, s are all in a line, and so one of them is between the other wo. 
Three arrangements arise when a < 0,0 <a < 1, or a > |, Take, for example, 
the case a > I, which is shown in Figure 5.8. In this case p is between o and s, and 
so do, s) = do. p) + d(p,s). Hence do, x3) = alo. 2(p)) + dig(p). gis), which 
shows that the points lie on a straight line by the triangle inequality. Since 
do, s) = ado, p), Ao, g(s)) = ad(o, g(p)) and hence g(av) = ag(v). The other wo 
cases are similar. 

By Proposition 2.17 g is a linear map. Say g(X) 
Proposition 5.4 to show that 4 is orthogonal, 

‘Any orthogonal matrix A can be used to give a linear coordinate change in 
AX’, which preserves dot products. Thus, by Example 5.6 4 = R, or 


AN. Now we may apply 


x 


10 
A=R, where 0 << 2r 


296 Orthogonality 


It follows that 


b+ RX 
SUX) = b + g(X) = jor ; i O<0<20 
Pt [i =| 


Thus / is (possibly) a reflection (in the x axis) followed by a rotation (about 
the origin, perhaps through the angle @ = 0) followed by a translation (perhaps 
b=0). = 


EXAMPLE 5.7 Factor the following congruences of the plane into (possibly) a 
reflection in the x axis, followed by a counterclockwise rotation about the origin 
(possibly through @ = 0) followed by a (possibly trivial) translation, 


1. Reflection in the line x + y = 1 
2. Rotation through the angle a, followed by reflection in the x axis, followed by 
4 rotation through fi followed by reflection in the y axis. 


Solutions 1, We imitate the proof of Proposition 5.7 and let b = /(0). A sketch 
(Figure 5.9) shows that b = (1, 1), ge) = f(x) — bis linear, so g(X) = AX, where 


the columns of 4 are g(1, 0) and g(0, 1) (Proposition 2.17). Since /(1. 0) = (1, 0), 
JN) = 0,1), and g(X) = CX) = (1, 1), we have that 


o=[ +L le 
a 


Now 


FIGURE S.9 


51 Distance and Angle 297 


is not a rotation matrix, since A{1; 2| # —A[2; 1}, so reflection in the x axis must 
be involved. 
Li ol 
-1 0. 


ro=[i]+L 4 alle a] 


So reflection in x + y = 1 is the same as reflection in the x axis, followed by a 
rotation of 270° (or —90°) followed by the translation that carries the origin to 
a). 

Here is a second solution. To find a formula for f we pick an orthonormal 
coordinate system in which the x axis is x + y = I, For example, 


1 
= (| + RigyX! oF 


Then in the x’y’ coordinate system fis given by 


0 
=! 


mle eoleED 
(13 18 3h 


wo=[i]+L ao 


2. We have 
TO qn 
| 0 nap 0 =] 
1 0 1 0 
[4 -=l =i) 


Sepaioyene 0 
fa =| 0 186 [o =| 


=1 oy! 0 } 
=[ 0 ile —1 |B akor 
on X 
= Regu -pX 


and hence 


f(x) 


Now 


(check it) so 


So this composition of rotations and reflections reduces toa single rotation, = 


298 Orthogonality 


EXERCISES 5.1 


In Exercises 1 through 5 find the 


inces and angles between the given pair of vectors. 


1, (1,0) and (0, 1), 2 (1,0) and (1, 1). 
3. (1,0,0) and (1, 1, 1). 4, (1,0,0,0) and (1,1, 1,1) 
5. Leamand mpl asin —> 20 


In exercises 6 through I the three vertices p,, Pp. Py of a triangle are given, Idenufy the 
Iriangles as (a) right, (b) isosceles, (c) equilateral, (d) degenerate (.¢., the vertices are 
collinear), (e) having an obtuse interior angle, oF (F) none of these. 
Hint: Asin Example 5.4 set = py — py 0 = py — fy. and A =u | 0 |v +0} Then 
all necessary information is contained in ATA. 


6 Vi) a 8 2,-2.0 
3 V3.3) Ps = (1. -2.2) 
V3.5) Py =, -1 0) 
9 =1.3,3,-0 10. We p=. =D 
2, 6,6, 2) Pz = (0.2, 0) 
6,2, 2,6) Ps =(—3,2,3) 
Write the APL function described in exercises 12 through 18 
12, Name: NORM 
Right argument A vector » 
Result: fell 
13, Name: DIST 
Left argument: A vector v in R* 
Right argument: A vector in R 
Results dee). 
14. Nam SPREAD 
Left argument: A vector v in R® 
Right argument: A vector w in RP. 
Result Angle in radians between & and w 
15, Name: DSPREAD 
Left argument: A vector v in RY. 
Right argument: A vector w in B® 
Result ‘The angle in degrees between v and w. 
16, Name: INTANG 
Right argument; The vertices of triangle stored as the columns of a matrix. 
Result The vector consisting of the three interior angles of the triangle in 
radians. 
17, Name: SIDES, 
Right argument: The vertices of a triangle stored as the columns of a matrix. 
Result: The vector consisting of the lengths of the sides. 
18, Name: AREA 
Right argument: The vertices of a triangle stored as the columns of a matrix. 
Result The area of the triangle 


Hint: See exercise 14 in Exercises 3.1 


5:1 Distance and Angle 299 


19. Find two linearly independent vectors perpendicular to w = (1, —1, 1) 
20. Find a nonzero vector perpendicular to u = (1, —1, 1) and v = (1, 1.0). 
21. Find two nonzero vectors v and w perpendicular to u = (1, ~1, 1) and to each other. 


22. (a) Finda vector v, perpendicular to v, 
generated by v, and u = (1,0, 1,0). 


Hint: Substitute X = [v, | uJ in ofX = 0.) 


1, <1, 1, <1) and lying in the subspace 


(b) Find a vector v, perpendicular to 0, and v, of part (a) and lying in the subspace 
generated by v,, «and w =(1,2,3,4) 


In exercises 23 through 28 an affine function is given. Is the function an isometry? 


_ 1 fe-y= vi 
nae) 
25. The affine function f: R* —» R? that carries the triangle with vertices (0,0), (1,0), 
(0,1) to the triangle with vertices (3, 4) = (0.0), (2,4) = (1.0), (3.5) = (0, 1). 
26. The affine function /: R® —+ R¥ that carries the triangle with vertices (0,0), (1,0), 
(0, 1) to the triangle with vertices (3.5) = (0,0), (2.4) = (1.0), 3.4) = /(0. 1). 


27, FRE RE by flu) = (46) + [1 3.26.4 Sh. 


<== 


2B. flx 


24. f(xy) = ( 


28. flx,y.2) = 


In exercises 29 through 33 an affine coordi 
system orthonormal? 


ie change is given. Is the new coordinate 


m t=[ altel ol” 
seta li ka Ha 


31. The new coordinate system is given by py = (3,4). py = (2.4). Ps 
32. The new coordinate system is given by py = (3.5) py = (2.4). 


= (3.4) 


33. 


34. Show that a parallelogram with diagonals of equal length is a rectangle. 
35. Show that the law of cosines is true for triangles in R 


300 Orthogonality 


36. Let w 1 bin R® and @ be the angle between u and u + v. Show that 


Hel 


Mui and sing = Hel — 
wea 


fu + el 
37. Show that if the Schwartz inequality is an equality, then the two vectors are linearly 
dependent 
Hint: Let A =u | 0] and apply Propositions 2.14 and 2.7. 


38. Show that if the triangle inequality jw + | < jul) + fol) an equality, then m, e, 
and w + 0 are parallel, Assume Exercise 37. 


39. Show statement | of Proposition 5.4. 

40. Show that a product of orthogonal matrices is orthogonal. 

41, Show that a composition of isometries is an isometry. That is, if f and g are 
isometries, then /(X) = /(g(X)) is an isometry. 

42, Show that a composition of similarities is a similarity. That is, if fand g are simila: 
ties, then A(X) = f(g(X) is a similarity 


43, Consider R® to be the x» plane in R#. Show that reflection in a line in R? can be 
accomplished by a 180° rotation about the line in R°. 


Hint: Since orthonormal coordinate changes preserve the distance and angle formu- 
Jas, you may assume that the line is the x axis, 


44, Show that similarities map lines to lines. 
Hint: Similarities preserve 180° angles: assume the result of Exercise 37. 


45, Show that f: R® -» R® given by ¥ = /(X) = aX, a a scalar, is a similarity that is an 
isometry only when « = 1 


46, Assuming the result of Exercise 44: 
(4) Show that a similarity maps any triangle to a triangle that is similar, 
(b) Let p, q.r be points in RY, /R* + RE a similarity, and let A /(p)/(g)) = 


dlp.) (ib a = A f(p). fig) + dp. g). Show that A f(p), AM) = adp.r) and 
A /Ur.J(q)) = adr. g) also. 
Hint; Use part (a), 
(©) Show that it f is & similarity, then there is a constant a such that 
HK f(p). f(q)) = ad p.4) for all points p and g. 
Hint: Given any p,q, and any py, gz there are constants ay, a2 such that 
at p,q) = df(p,)./q)) for 1 = 1,2. Use part (b) to show that a, =a. 
47, Let fi R® — R? be a rotation through the angle @ about the point p, Write fas a 
rotation about the origin followed by a translation. 
Hint: See Example 4.10. 
48. Let /, and /, be two lines intersecting at the origin. Let fi R® + R® be reflection in /, 
followed by reflection in /,. Show that fis rotation through the angle 2a, where a is the 
acute angle from f, 10 [y 


Hint: Formulas for the reflection are given in Example 4.11 
49. Exercise 46 shows that a (not necessarily affine) function f: R? —+ R® is a similarity if 


and only if there is a constant a > 0 such that dl /(p), f(g) = adlp. q) for all p, q in RE. 
Assuming this result, show that a similarity is the composition of at most four maps: a 


52 Diagonalization of Symmetric Matrices 301 


reflection in the x axis, a rotation about the origin, multiplication by a scalar, and a 
translation. 


Hint; Show that g(X) = (I/a)/(X) is a congruence. Use Exercise 33. 
50. Derive the law of cosines from the Pythagorean theorem (Figure 3.3) 
SI. Prove statement 1 of Proposition 5.1 
52. Prove statement 2 of Proposition 5.1 


33. Any three vectors in R® lie in a subspace of dimension 3 at most. By putting an 
‘orthonormal coordinate system on this subspace, we obtain a geometry-preserving repre- 
sentation of the three vectors in R" as three physical vectors in space (R'). This allows us 
to analyze the relationships among the vectors geometrically 
(a) Let v, w be three vectors in space. Suppose that the angle between w and ois 
45* and the angle between v and w is also 45°. Show by a sketch that the angle 
between u and w can vary from 0° to 90° 
(b) Let X, % Z be three measurements on population, Suppose that the corr 
tions between X and ¥ and between ¥ and Z are 


Show that ry can vary from 1 to 0 (approximately). 
Hint: Use part (a) 


(©) Let, ¥, Z ean be as in part (b), Suppose that ryy = -7 and ry 
negative? 


5.Can ry be 


5.2 The Diagonalization of Symmetric Matrices 


In this section we will extend the methods used to analyze conies in Section 4.3 to 
quadratic functions defined on R" for n > 2, This has several important applica- 
tions, including the second-derivative test for maxima and minima (Section 5.3*) 
and moments of inertia (Section 5.6"). 

Given a quadratic function 


[(X) = 6 + BN + XTAN 


where A = AT isn by 1, B is 1 by n, and ¢ is a scalar, we wish to simplily / by 
an affine coordinate change 


X = po + PX! 


Substituting the coordinate-change formula into the expression for /; We ob- 
tain 


A(X) = fly) + (B+ 2PTAYPN! + (NPTAPYN’ 


Notice that the choice of p, does not affect the quadratic term. If (he equation 
B + 2pA = 0 hasa solution pg, then the linear term may be eliminated by choos- 


302 Orthogonality 


ing py independently of P. In many applications, however, the linear term is 
automatically zero and the main problem is that of choosing P. One chooses P to 
eliminate the cross-produet terms (i. terms of the form x,x,) from the expression 
for f; This is equivalent to choosing P so that P™AP is diagonal — that is, 
(PTAPYis f| = 0 it i Ap. 

In Section 4.3 it was shown that if A is 2 by 2, then there is a rotation matrix 
R, such that RAR, is diagonal: this means that X""(RZAR,)X" has no xy term. In 
the general case it is possible to find an orthogonal matrix Q such that Q7AQ = \ 
is diagonal; however, Q is considerably more difficult to compute. 

‘We will compute Q by the iterative process known as the Jacobi algorithm. We 
begin with a symmetric matrix A that is not diagonal. Suppose that Afi; j] 4 0 
with i # j, Then we use a coordinate change V = Q,X’ that rotates the x,x, plane 
so that if A, = QTAQ,, then A\[i;/] = 0. Now if A, is not diagonal, then there is 
acomponent Alf; k] # Owith h # k. This time we use Q, to rotate the xx, plane 
so that if dy = Qf4,Q., then Auli; k] = 0. 

Unfortunately this process does not terminate in a finite number of steps, 
because Ayli; /] # 0 in general, even though A,[é;/] = 0. We will describe a way of 
choosing the coordinate changes Q,, however, so that the sequence of matrices 
Ay Ap Ags A, approaches a diagonal matrix as n—> 20, We may then carry 
out the process until 4, is close enough to a diagonal matrix for our purposes. 

If the coordinate-change matrices Q, are indexed so that 


Ain = PA Qor 
then 
A, = OFORs -- - QFOTAQ,O, . . Q, 
Q,Qz » + « QyTA(Q1Qe « - - Oy) 
= Q"AQ 


where @ = Q,Q, ... Q,. The matrices Q, will be chosen to be orthogonal, and 
50 Q will be orthogonal (exercise 40 of Exercises 5.1), 

The details of the algorithm are contained in the proof sketch of Proposi- 
tion 5.9 below. First, however, we need some formulas involving matrix multip! 
cation. These formulas involve vectors of row and column indices. We will say 
that such a vector of indices U is partitioned by the vectors Vand Wit V and W 
have no components in common and the catenation V,1V contains the same en- 
tries as U, This means that V,H/is just U, possibly with components reordered 
For example, U = (1, 2.3.4.5) is partitioned by = (2,4) and W = (5,3, 

The next proposition follows easily from Proposition 2.1. The proof is left as 
exercise 26, 


Provosition 5.8 Let A and B be matrices such that AB is defined, Let I, J, 
U,V, W be appropriate vectors of indices and suppose that U is partitioned by F 
and W. Then : 


1. (ABYL: J) = AU IBES 
2. All: UIB[U: J] = AU VBL J] + AL: WYBUW 


5.2 Diagonalization of Symmetric Matrices 303 


Proposition 5.9 Let A be an n-by-n matrix. Then statements 1 and 2 are 
equivalent. 

1. Ais symmetric. 

2. There is an orthogonal matrix Q such that O7AQ is diagonal. 


Proof We will argue that statement 1 implies statement 2, leaving as exercise 28 
the proof that statement 2 implies statement | 
We need a measure of how “nondiagonal” a matrix is. We define $S(A) to be 
the sum of the squares of the off-diagonal entries of 4. Then 4 is diagonal if and 
only if SS(A) = 0. 
Now suppose that A is not diagonal, 
= (i,j) and let 


y that Afi: /) #0 where i Aj. Let 
L be the rest of the indices from 1 ton so that K and L partition 


mn 

Since 4 is symmetric, so is the 2-by-2 submatrix A[K; A]. By Proposition 4.10 
there is a rotation matrix R, such that RZA[K; K]R, is diagonal. Define a matrix 
Q, as follows. Begin with an n-by-n identity matrix (Q, — /D n) and then repla 
block by R,(OK; K]— R,). Thus Q,[K; K] QL; L) 
.; K]=0, and Q,{K; £] =0. The matrix Q, is orthogonal (Exercise 27). 
Set A, = QFAQ,. Using the formulas of Proposition 5.8, we have 


(AQ IK: K] = ALK) 
UK: KJQ\K: K] + ALK: LIO(L: K) 
= AIK; KR, 


Similarly (AQ, [L: K] = ALL; KYRy, (AQ IK: L) = AUK: Land (AQ [Ls L] = 
ALL; L), Hence 


ALK: K] = QFLK:(AQ EK] 
= OF[K: KY AQ IK: K] + OFLK: LKAQ IL: K] 
= RYALK: KIR, +0 


= REAIK: KR, 


Thus, A,[i;/]=0. Similarly we have AL; K]= AL: KIR,, AK: L) = 
RTA; Land Ay[L; L] = A[L; L}. 

We are now in a position to compare S$(A) and SS(A,). Notice that for any 
n-by-n matrix B, SS(B) is SS(BIK; K}) plus SS(B[L: L)) plus the sums of the 
squares of all the entries of BIL; K] and BK: L} 

Since A\[L; L] = A[L: L}, we certainly have SS(A[L: L)) 

Notice that the columns of A,{K; L] are the columns of A[K; 
through the angle —@ (RJ = R_,). Now the sum of the squares of the entries of 
A,[K; L]is the sum of the squares of the lengths of its columns. Thus the sums of 
the squares of the entries of A,[K; L] and A[K: L] are the same, A similar argu 
ment (take transposes) shows that the sums of the squares of the entries of 
A\[L; K] and A[L; K] are the sami 2>0 and 


304 Orthogonality 


SS(AK; K)) = 0. Thus 
$5S(A,) = SS(A) — 24[i: JF < SSCA) 


This shows that SS(A,) decreases with n, but does not show that it decreases to 
zero, Suppose, however, that Ali; J? was the largest off-diagonal entry of A*2. 
‘There are ri? entries in A, and so there are 1? — n off-diagonal entries. Since the 
largest number is always bigger than the average, we have 


AIF 2 SHA) 
Therefore 
SS(A,) = SS(A) = 2A, jf 
< S8(A) — 2 s5¢4) 
man 
He len 
= (1-2) 
Let 
q 
Then 


SS(A,) < ay" SSA) +0 asn—> ew 


The proof sketch above is not rigorous. It shows that the off-diagonal entries 
of A, go to zero, but it does not show that the diagonal of 4, approaches a fixed 
vector or that the accumulated product Q,0,...Q, approaches a fixed orthogo- 
nal matrix @. 1 does, however, show how to compute an orthogonal matrix Q so 
that QAO is as close to diagonal as desired, and we now develop an APL function 
to do so. The logical gaps in the development can be filled after eigenvalues are 
discussed in Chapter 7. 


THE FUNCTION JACOBI 


Given a symmetric matrix A, we will apply the iteration scheme described until 
We arrive at a matrix 4, whose off-diagonal entries are negligible compared to the 
original entries of A. We will then take Q = Q,0,.. . Q, as the coordinate-change 
matrix. 

We will use three auxiliary functions. First we will need a function to find the 
indices i:j such that Ali; is maximal. It is sufficient to define a function, call it 
JACFIND, that given a matrix A finds the row and column indices of the largest 
off-diagonal entry of |A. 


5.2 Diagonalization of Symmetric Matrices 305 


We will also use the function ROT from Section 4.3 which computes R, given 
4 and a function ANG (exercise 12 of Exercises 4.3) which computes 0 given 
AIK; K]. 

To begin the iteration we set 4, = A and Q, = ID ItpA. We take B equal to 
the largest magnitude in 4 (.r/, |a) and stop the iteration when B = B + 
Ali: j] — that is, when Af; /] is negligible compared to B (cf. Section 3.4), 


¥ Z-JACOBI A .8.1:K:0 
U1 Z6tet0 vipa 
(2) gery.ta 
[31 Lek JACFING A 
[4] (Beara TKETI K(211)/0 
{5} 4 
[6] QIK: KI-ROT ANG ALK:K} 
(7) ARGO) + «As x0 
[8] 226 x0 
19) at 
v 


A function that will do for JACFIND is 


3 Z-JACFIND A 6 
on AC IAVHCE 1 O)KID TVA 
2) Zt PAV Bel A 

3) 22,A(2.108 


The quantity ©/.0 is the smallest number that can be stored in the ma- 
chine — “minus infinity” in effect. Adding this number to the diagonal of | 
ensures that, in cases of practical interest, the diagonal is negative. In fact the 
diagonal entries will be © .0, since numbers of practical interest are negligible 
compared to 5/0. 

Line [2] sets B equal to the largest off-diagonal entry of |4 and computes the 
index of the first row in which B occurs. Line [3] then adds the column index. 

The function ANG is left as exercise | 


EXAMPLe 5.8 We will follow the operation of JACOBI by putting a trace on line 
{7}. In APL acing is controlled by the (race vector of a function (see Appendix B). 
The trace vector of the APL function Fen is written rarcn. The trace vector is 
created and erased with the function and is not listed by the ) vars command. To 
set a trace on lines [6] and [4] of the function Fon use the expression 


TAFON6 4 


‘Then every time line [6] of Few is executed the machine prints Fen; followed by 


306 Orthogonality 


the last value computed on line (6): Similarly for line [4]. To turn off tracing use 
the expression 
TAFCN-10 


Tracing is a debugging aid in the APL system, but it is very useful for gaining 
insight into iterative procedures. We will use it to watch JACOBI operate on a 


3-by-3 matrix, A. 


A 
1aa 
245 
35 6 

TAJACOBI-7 

JACOB! A 
JACOBI (7) 
10060-3851 3. 89E0 The 2:3 entry is set to zero, The 
2556-1 9906-2 6. 94E-18 1;3 entry increases, but the 1:2 
3 59€0 6 346-18 1 OEY entry decreases. 
JACOBI [7] 
2.4561 -9.96E-1 1 04€-17 The 1:3 entry is zero, The off-diag- 
2961 “9.9062 1. 166-1 onal entries are all an order of 
yee 37 1 yee) 1 13e1 magnitude smaller than the off- 
Jacoat {7} diagonal entries of A 
5 15-1 “5.42619 “7.0162 
867619 1.7261 -9.06€-2 ‘Two orders of magnitude smaller. 
T31E2 -9.06E-2 1.126 
JACOBI 7) 
S15E1 “5 996-4 7.9182 
5996-4 1.7161 1 00E 49 
Tae? 1.916 18 41368 
Jacoat {7} 
51661 “5.9984 1006-18 


593E4 17161 3.656 6 
2A1E 19 9.6566 © 1.136 


Four orders of magnitude smaller 


JACOB) 171 
S.1GE1 -3.4ac 19 9.1569 

6.56619 1.7161 9.6566 Six orders of magnitude smaller 
3.1569 © 3.6566 1.1381 

JACOBI {7} 

5.18E1 1.03615 3.1569 

1.03615 1.7161 -9.40€ 20 Nine orders of magnitude smaller 
3.1569 1 24e-18 1.9381 

JACOBI (7) 

5.16E1 1.03618 4 226-17 

103E15 1.716) -a.40E-20 The iteration stops, since Ger is 
V.S1E17 “1246-18 1.1361 about 3€-15 for this system. 


2 Diagonalization of Symmetric Matrices 307 


The coordinate-change matrix Q is 


Q 
0.737 -0.591 0.328 
0.328 0.737 0.591 

0.591 -0 328 0.737 


In the accumulation @ = Q,0, . . . Q,, little computational error is involved. 
The computed matrix Q is quite orthogonal (this system carries eighteen digits): 


(1D 3)-0+ x00 
1.73618 -1.526°18 98 676-19 
1.52618 9347618 ~1 oBE 18 
867E19 1.08618 1-73-18 


And Q7AQ is close to diagonal: 


(80) + Ay xO 


S.16E-1 “1036-15 2. 08E 17 
103615 1 7161 104-17, 
1.77617 6.72618 1.1361 . 


Ifa quadratic function /: R" — R can be simplified by a coordinate change to 
the form 


+ DAG) 


[X= 


then /has a maximum at X° = 0 if and only if A, < 0 for all /and a minimum at 
X’ =O if and only if, > 0 for all i, If some A, are positive and some negative, 
then fhas neither a maximum nor a minimum at X" = 0, In this case fis said to 
have saddle at X’ = 0 

If the linear term cannot be eliminated by a coordinate change, then the 
methods of the optional Section 2.7 can be used to show that f has neither a 
maximum nor a minimum nor a saddle (exercise 23). In this case we say that has 
no critical points. If the coordinate change X= py + OX’ eliminates the linear 
term, then pg is a critical point for f. 


Exampce 5.9 Discuss the critical points of 


Mey A+ 3x — yp 4 2s + 3x + 2p? + 22? + 4xz + 2yz 
Solution 
* fix 44[3 —1 fx 0 2) [x 
y 2 1) hy 
: Pe sis Es : 


© + BX + XTAX 


308 Orthogonality 


inate the linear term we need p, such that B + 2pfA = Oor 4p, = —4B7. 


+P0-(3 -1 2BA)--2 
o5 1-15 


‘Thus f does have a critical point, To diagonalize 4 we use 


Q-JACOBI A 
0,739 -0.427 -0.521 
0,233 0 888 -0 397 
0.632 0.172 0.756 


to obtain 
(00) +. xA+ xO 
4-710 8.67E-19 1 1SE17 


1.79618 2.1960 2 Bse-18 
156617 2.17618 9 68E°2 


Since /(po) is 


44(9 -1 24 xPO)*PO% xA*. xPO 
2.75 


‘we see that using the coordinate change X= py + ON’ we obtain, to three signifi- 
cant figures 


Sx. 2) = 2S + ATX + 21P + 096827 


Thus f has an absolute minimum value of 2.75 at the critical point 
Po=h1,-). 0 


Given a symmetric matrix A, Proposition 5.9 says that there is an orthogonal 
matrix Q such that Q74Q = .\ is diagonal. How unique is \ and how unique is 
Q? We can permute the diagonal entries of \ by renumbering the axes of the 
coordinate system that Q defines. Except for order, however, the diagonal of A is 
unique. The diagonal entries of A are called eigenvalues of A. The next proposi- 
tion gives a very useful characterization of the eigenvalues of A. Although we are 
assuming that A is symmetric, the concept of an eigenvalue will be extended in 
Chapter 7 to general matrices in such a way that Proposition 5.10 remains true. 


Proposition 5.10 The eigenvalues of 4 are those scalars ) for which there is a 
nonzero vector v such that Av = Av. Thats, the eigenvalues of A are the scalars \ 
for which the matrix A — \/ is singular. 
Note: If Av = do with v 4 0, then vis called an eigenvector of A belonging to 
the eigenvalue J. 


5.2 Diagonalization of Symmetric Matrices 09 


Proof of Proposition 5.10 First assume that OTAQ 
Then, since Q" = Q-*, left multiplication by O gives 


. \ a diagonal matrix, 


AQ = ON 
In particular 
AQl:i] = (AOI = (OA: = OAL] 
0 
= ofa, 
0. 
=),Olii) 
where A, = Ali: /}. Thus the eigenvalues satisfy the condition of the proposition, 


and the columns of Q are eigenvectors, 
Conversely suppose that Av = Av, Since Q is invertible with @* = Q"1, the 
equation OX = v has solution w = O%v, Thus 


QAw = Ow 
Aw = Aw 


So Awf/] =Ayn{i) for all i. If w{i] #0 for some 4, then A =, and so is an 
eigenvalue. (Further, w{/] = 0 for all j’s such that 4 ,.) 


Matrix products of the form B7B often arise, They appeared in the discussion 
of least-squares approximations in Section 23 and in the formula for computing 
dot products with skewed axes in Proposition 5.6 (see exercises 31 through 34), In 
Statistics they arise in the computation of correlation and covariance matrices, 
The next proposition gives some alternate characterizations of such products. The 
converse problem of factoring a symmetric matrix 4 as BTB arises in the statistical 
technique known as a factor analysis. 


PROPOSITIO’ 
equivaleni 
(a) A = BB for some matrix B. 


11 Let A be a symmetric matrix, Statements (a), (b), and (c) are 


(b) The eigenvalues of 4 are nonnegative, 
(ce) f(X) = XTAN > 0 for all X. 


310 Orthogonality 


Statements (d), (e), and (f) also are equivalent: 


(4) A = BTB for some matrix B with linearly independent columns. 
(e) The eigenvalues of A are strictly positive. 
(0) f(X) = XTAX > 0 for all X and /(X) = 0 implies X 


Proof Let Q be orthogonal such that QTAQ = A is diagonal with » 1 = -\ equal 
1 (Ay AgsosevA,)e Then in the coordinate system given by X = OX" we have 


Sh) =X AAG? + Ag(Xa? + os + AOE 
The equivalence of (b) and (c) and the equivalence of (e) and (f) follow. 

if A= BB, then f(X) = XTAX = N™BTBX =(BX)TBX >0 and 
(BX)"BX = 0 if and only if BX = 0 by Proposition 5.1. Thus (a) implies (c) and, 
by Proposition 2.11, (d) implies (F). 

Last, assume (b) is tue. Then M = A+ + 2 is defined and MM = A. Thus 
A = QAQT = OMMOT = (OM) (QM) = BTB, where B =(QM)T. Thus (b) 
implies (a). If (d) is true, then since both M and @ are invertible, the matrix B is 
invertible, = 


Notice that there is nothing unique about the factorization A = BTB. If C is 
any matrix such that C7C = /D 1 f pB (i.c., a matrix with orthonormal columns), 
then (CB)'CB = BTC'CB = BTB = A, 

A symmetric matrix that satisfies condition (a), (b), or (c) is called positive. If 
A satisties (d), (e), or (f), A is called positive definite. 


Exampre 5.10 Is the matrix 


Drea Ss 
a Wg iid 
Aes ie 
-| 5 62 
Be ay as 


positive? positive definite? If so, factor A = BTB. 
Solution First find an orthogonal matrix @ such that OTAQ = D is diagonal, 


+D-(UO)+ ¥A¥ *O-VACOB) A 


1.4062 3.53615 2017618 6.946 18 
3.54615 1 4361 3.25618 2 60E 18 
179619 3.47616 24360 5.20618 
B56 18 6.94618 1 52618 8 2160 


‘To three significant figures the eigenvalues are 014, 14.3, 2.42, and 8.21. Since 
the eigenvalues are all strictly positive, the matrix A is positive definite. 


52 Diagonalization of Symmetric Matrices 311 


The proof of Proposition 5.11 shows that one choice of B is B = (OM)! where 
M = D+ =2. Although the off-diagonal terms are negligible in D, notice what 
happens if we take square roots: 


Ds-2 
1.18E1 5.946 1 -47E-9 «263-9 
5.9568 3.7960 1 80E-9 1. 616-9 
4.15E10 1866-9 1. 58E0 2286-9 
29369 26369 12369 2 8760 


Although this is still diagonal enough for almost all purposes, there is 
no point in introducing added error when we can avoid it, Since Ger for this 
system is about 3€°15, the expression Dx10+10+0 will set the negligible terms 
equal to zero, We use 10, since the largest entry of D (14.3) has magnitude 
about 10. 


0x10¢10+0 
0.014 0.000 0.000 0.00 
0.000 14300 0.000 0.000 
0.000 0000 2.430 0.000 
0.000 0.000 0000 8.210 


$M(Dx10210+D)+-2 


0.118 0.000 0 000 0 000 
0.000 3.790 0.000 0 000 
0000 0.000 1 560 0 000 
0.000 0.000 0.000 2.870 


To three significant figures B is 


+B 8O+ Mt 
0.102 0.048 -0 011 0 034 
0.632 2.800 2.080 1.940 
0.369 0.712 1.290 0.342 
1.230 0.806 0.137 2.470 


Always check the accuracy of the factorization by comparing 4 to BTB. 


SEAN (MB) + 6B 


1.00615 1986-15 1.70615 1.246°15 
1.98615 2 21615 § 69E-16 §2.536°16 
1.70615 5 69E 16 ~3.05E 16 “6 73E 16 
1.24615 253616 6.7316 ~7.36E-16 


Since er for this system is about sets, this error matrix is negligible 
compared to the original data in the matrix A. 


312 Orthogonality 


ari {a 
SaS¥E 
jet) wt 
Tah fe 
Hes ai 
ate te 


If we do not replace D by «14140 the error is much worse. 


+000) xD*.2 


0.102 0.048 -0.011 -0.034 
0.632 2.800 2080 1.340 

0.369 “0.712 1.290 -0 342 

1.210 0.808 0.137 2.470 

AA (8C)" xo 

6.1068 1.9667 1.2067 8, 83E-8 
1.9067 "1.2667 3486-8 1 146-8 
1.2067 °3.48E°8 «1 -78EB 3 19EB 
8.368 91. 14E 6 «9 19E BSE Bw 


If is any matrix (not necessarily square), then ATA is a positive matrix and 
hence has nonnegative eigenvalues. 


DEFINITION 5.8. The singular values of A are the positive square roots of the 
eigenvalues of ATA, 

The matrices A and ATA have the same rank (Proposition 2.14 and Proposi- 
tions 4.26 and 4.28), and if Q is orthogonal then \ = Q"ATA)Q = (AQ)TAQ has 
the same rank as ATA or A (exercise 25). Suppose that the entries ofA are numbers 
bers derived from experimental measurements. Then the entries will tend to con- 
tain statistical errors. Random perturbations of the entries tend to inerease the 
rank. The “true” rank of such a matrix is often estimated using the singular 
values. The singular values of A are computed and arranged in descending order. 
A precipitous drop in the size of the (k + 1)th singular value is taken as evidence 
that the true rank of 4 is k. This is illustrated in the next example. 


EXampte 5.11 Consider the matrix 


A 
40-4 4 5 
so 3 0-3 
7-29 1-8 
40-41 5 
0-22 20 


which has rank equal to 3. 


52 Diagonalization of Symmetric Matrices 313 


ECHELON A 
1.000000 0 00000€0 10000060 4.99681E-19 ~1 0000060 
“1.38778E-17 1 00000E0 —~1. 00000£0 © c00000 1.000000 
“1.38778E-17 0 00000E0 ©. 000000 1. 0000060 1. 00000E0 
1 04083E-17 0 00000€0 346945618 5.67362E-19 6. 99889618 
0 00000€0 0 00000E0 © 00000 0.000000 0 000000 


we can add some “statistical error” to A by using the APL random number 
generator 7 (Roll). The expression »w picks a random integer from | to N. Thus, 
for example, the expression so picks a random integer between —4 and 4. 
Roll is a scalar function and operates componentwise on arrays. 

Let us add some random noise to A in the fourth decimal place. 


$ALAS(- 5475 509) -1E4 


4000300 0 000100 3.999600 000400 5 000300 


1 
2.999700 -0.000200 3.000000 0 000100 -3 000000 
6.999700 -1.999900 8 999700 + 000400 "7 999700 
4.000300 -0 000400 -3 999800 0.999800 + 999800 
0.000300 ~1 999900 1.999900 2 000400 0 000400 
Now the rank of A is 5, 

ECHELON A 
1000000 —«9.42972E 15 1.364536 14 5 27356E 16 1.557008 14 
S.98480E-14 1 00000E0 1. 73264E-14 3.24914 15 2 255148 14 
119687613 6 77115E-14 1.00000€0 1.421096 14 2 27674 13 
119687613 3.94702E 14 1 13687E-13 10000060 0. 000000 
2.27974E-19 § 16549E 14 -1.13687E-19 2 B4217E-14 1 0000060 
Next compute the singular values of A by diagonalizing B = ATA 

O.JACOB! Bn(HA) + ¥A 

+Du(W0)*-xB+ x 
14254267 1 06434E-13 4.44089 16 5 20417618 9, 087196 17 
106414613 3 70445E-} 3.93087E 16 1. 38778E 17 4 O494E 14 
3.462943-16 3 61690E 16 3.975132 «7 -2uSBaE~16 2 16686E 16 
105993617 -2 99240617 7.77156E16 1 61028E) 1: 373316°18 
BO7S77E\7 4 O4439E-34 1. 8OATTE 16 —4.94395E°17 1. 99986E 10 


The singular values of A are the square roots of the diagonal entries of D. We 
would liké them sorted in descending order. The APL expression vj tv) sorts the 
components of v in descending order. 


s-(1 10D)* 
sits) 
18.3715 4.0128 0 608642 0.000377547 0 0000141409 


314 Orthogonality: 


‘There is a drop of several orders of magnitude from the third to the fourth 
singular value. = 


The statements made above about “true” rank are misleading. The question 
“What is the true rank of 4?” is not the sort one wants to ask. A better question is 
‘What is the order of magnitude of the effects that I am neglecting if I assume that 
Ahas rank k?” In the example above, if we assume that the rank of A is 3, we are 
neglecting effects several orders of magnitude below the primary effects. For the 
sense in which one may “assume that A has rank A see exercise 30. 

We close this section with an application to plane geometry that illustrates the 
geometric significance of the singular values of a matrix. 

First some terminology. We will call an affine transformation /; R®  R® a 
stretch if f(x,y) =(ax,y).a > 0 in some coordinate system. The map f stretches 
line segments parallel to a fixed line / by the factor a and does not change the 
length of line segments perpendicular to / (Figure 5.10). We get the formula 
f(x,y) = (x,y) if We choose the x axis parallel to /. 

The number a > 0 is called the stretch factor and the line fis the stretch line. 
Of course if a <1 then fis really a compression. 


Proposition 5.12. Every affine transformation /(X) = 6 + AX from R® to RE 
with 4 nonsingular can be factored as wo stretches with perpendicular stretch 
lines followed by a congruence. The stretch factors are the singular values of A 


Proof We diagonalize the positive matrix AA, There is an orthogonal matrix Q 
such that Q"(47A)Q = \ = D* where the matrices \ and D are diagonal. Since 4 
is nonsingular, ATA and D are nonsingular. Let U = AQD"!. Then U is orthog- 
onal. In fact 


UTU = D'QTATAQD™ = D*'DED™' = 1 
Further, 
UDQ" = AQD™ DQ =A 


Now let us introduce the coordinate change 


x 


ow 


ince Q is orthogonal, this is an orthonormal coordinate change, and 


ee, fle) Sid) 
f 
— 
a 
tay 10) 
1 1 
A stretch: dt /(a), fe) = dla, €), d(fa), f(b) = ada, b) 

FIGURE 5.10 


42 Diagonalization of Symmetric Matrices 318 


Y=b+Ax 

OY =b + AQN’ 
Y= QMb + UDOTOX’) 
Y’ = (07) + UDX’ 


Thus in the new coordinate system /(X’) = g(i(X")), where A(X") = DX’ and 
(X") = 07 + UX’ —a congruence, since U is orthogonal. 


Now 
_ (4 9V_ ft °) 10 
o=(( a=(c 1 (; ) 
is a stretch with stretch factor d, and stretch line equal to the x’ axis followed by 


a stretch with stretch factor d, and stretch line equal to the y’ axis, The stretch 
factors d,, d, are the singular values of A. « 


The above proposition easily generalizes to affine maps f; R" —» R® with non- 
singular linear part, The only change is that D is then the product of n stretches in 
mutually perpendicular directions. 

The columns of the matrix @ defined in the proof of Proposition 5.12 are 
vectors parallel to the stretch lines. Such vectors are called singular vectors of A. 


EXERCISES 5.2 


1. Write an APL function ANG to be used by JACOBI, See Exercise 12 of Exercises 4.3, 


In exercises 2 through 7 a symmetric 2-by-2 matrin is given. Use Proposition 4.11 to find an 
‘orthogonal matrix Q such that QTAQ is diagonal, Write down the eigenvalues of A. Is A 
positive or positive definite? If A is positive, factor A as A = BPA, 

1-1 


2 a=[}. 3] sa=[) i] 4 ae[) 


on 0: = cos2@ sine 
a 4=[| al ” 4=( ni i‘ a=[sn30 cos 20 


In exercises 8 through 12 use Proposition 4.11 to find the singular values of the matrix A. 
12 43 4 =| 
= % A= 10.4 
el [il [os 
; te 
1 oo 2A Hie 


(Computer assignment) In exercises 13 through 17 a symmetric matrix 4 is given, Use the 
function JACOBY to find an orthogonal matrix Q such that QTAG is diagonal. Write down ~ 
the eigenvalues of A. Is A positive or positive definite? If 4 is positive, factor A as 
A= BB. 


316 Orthogonality 


o1 0 

ee 110 

Tic el se 14. Apr 
Maes: ema 

PEt ier cet 
wach 6. A [ie 
ebb r12 


17, The 6-by-6 Hilbert matrix. The 7 component of this matrix is 1+ (/ +) — 1. 
Note: JACOBI converges slowly on Hilbert matrices. 


(Computer assignment) Discuss the eritieal points of the quadratic function defined in 
Exervises 18 through 22. 


18, f(s y2) = 3 + Tx — Dy ae $x? = dey + yz — + 9 


19, f(s,y,2) = 2 = Tu + $2 — 2x = 9? — 228 4 
1 Oo -I oO 
- 0 
20, fixy=d4[) 0-7 HAT is 3 4 Was 
0 0 1-1 
2 flay 2) = <1 + + 2p + Bz 4 et — Dey + py? + Dez + Sz? + Dye 


22, flxiy2) = 6+ 2x + Ae + Qe? — Dey + p24 Dee + Se? + Dy, 
23. This problem assumes familiarity with the optional Section 25*. The critical points 
of a function f: RY + R are the points at which the derivative (D/\N) is zero. Using the 
computation of Example 2.48, show that a quadratic function /(X) = C + BX + NTAN 
has critical points if and only if there is a change of variable that eliminates the linear 
term, 


24, Show that the singular values of an orthogonal matrix are all equal to 1 


25. Letd be n by mand #7 by mand invertible. Show that A and P-1AP have the same 
rank, 


Hint: Propositions 4.25 and 4,26, 

26. Prove Proposition 5.8 
Hint: Proposition 2.1, 

27, Show that the matrix @ defined on lines [5] and [6] of JACOBI is orthogonal. 
Hint: Imitate the calculation in the proof of Proposition 5.9. 

28, Show that statement 2 of Proposition 5.9 implies statement 1 


29, (Singular-value decomposition) Let A be a matrix and Q an orthogonal matrix such 
that OATA)Q = D*, where D is diagonal. Let (1 1% D) (x) be the nonzero diagonal 
entries of D (i.e.,4, ATA, and D all have rank pk). The singular-value decomposition of A is 


A=UAV 


where A = D[K: 4] is diagonal, V = Q[:KJP has orthonormal columns, and U = 
AQLKID|;K]-* has orthonormal columns, The object of this exercise is to verify the exist- 
ence of the singular-value decomposition. Let L. be the vector of column indices of D that 
are not components of K. 


$2. Diagonalization of Symmetric Matrices 317 


(a) Show that O[:KF(ATA)OL:K] = DIK, KF. 
Hint: D[k: KF = (Q%AAT)Q) [K; A]. Apply Proposition 3.8. 
(b) Show that U has orthonormal columns; that is UU = 1. 
(©) Show that AQI:L] = 0. 
Hint: 0 = (QTATAQ) [:L] = OTATAQ[:L) and Proposition 2.14. 
(d) Show that OOF = O:KIOLKF + Ol: LOLLY 
Hint: OOF = (907) {;] = (007 
58. 
(e) Show that A = UAV. 
30. Apply the singular-value decomposition (exercise 29) to the “perturbed” matrix A of 


Example 5.11 by assuming that the two smallest eigenvalues are zero, Compute the error 
matrix B =A — UAV. 


«N|where N = 11pQ. Apply Proposition 


Generalized Dot Products 
31. Let 


(X.Y) = AFY denote the dot product in R" Let V = AX" be a coordinate 
inge. Show that in the new coordinate system QX", Y") = (NAY, where A in sym= 
meine and positive definite, 


Hint: Substitute X= AN" and ¥ = PY” 
32. Define ¢,) to be a generalized dot product if X,Y) = NTAY for some symmetric 
and positive-definite matrix A, Show that if¢ .) is generalized dot product, then there is 
a coordinate change N = PX such that in the new coordinate system (X', ¥’) = (NPY 
Further, the new axes may be chosen mutually perpendicular (although there will be seale 
changes). 

Hint: Factor A = PrP. 
BB. Let (,) bead 

Gi) GX, ¥) = CN, all XY in Rs 

Gi) OXY, + Ya) = OLD + ON MD. all X.Y i RM 

(iil) (Xa ¥) = aX.) all X.Y in RY and scalars ay 

(iv) (XX) 0 and (N,N) = 0 if and only if ¥ =0, 

(a) Show that we must have 

GH EX + Ne VY = OX MD + Xa YY. aM NG Nye Vin RM 

(9 (aX, YY = aX. Y. aL X,Y in Re and scalars w 

(b) Show that ¢,) is a generalized dot product 

Hint: Ali; /] = Cd, My). where 1 is an identity matrix, 


ic function on R" such that 


Simultaneous Diagonalization of Two Quadratic Forms 


34. Let f(X) = ATAN, eX) = ATBN be two quadratic forms on R" with / positive defi- 
nite, Shows that there is a coordinate change X = PX" such that /(X) = (TN and 
g(X) = (WAN where A is diagonal. 
Hint: Apply the result of exercise 32 to the generalized dot product (X,Y) = NTAY 
to obtain a new coordinate system in which (X". ¥") = (XTY""is the ordinary dot 
product. Then apply Proposition 5.11 to the matrix of x4”. 


318 Orthogonality 


5.3" (Mul te Calculus) Optimization — 
the Second-Derivative Test 


This section is a continuation of Section 2,5*. Proposition 2.23, the second-deriva- 
tive test for max-min, states that the character of a critical point p of a twice 
continuously differentiable function f: R* —+ R is the same as that of the quad- 
ratic form O(N) = ATD2/(p)X at the origin. 

We will not prove this fact but will give a brief heuristic argument. 

The first three terms of the Taylor expansion of fat p are (Proposition 2.25) 


PUN) =/(p) + Dflp) (X = p) + raat — pt D*f(p) (X — p) 


Ip isa critical point, then by definition D/(p) = 0. Changing coordinates so that 
p becomes the origin (X =p + X") gives 


POX!) = fp) + LN DY(p)X” 


for this quadratic function, Clearly P(X) has a maximum or minimum at the 
origin if and only if Q(X) does. Thus Proposition 2.23 states that the character of 
‘critical point p of fis determined by the first three terms of the Taylor expansion 
auf. 

The function JACOBI of Section 5.2 makes the second-derivative test practi- 
eal by making it easy to analyze the function Y= Q(X) at the origin. If 
P = JACOBI D*f(p) and X = PX’, then in the X’ coordinate system we have 


MG 0 
Y= Ox) =(0" 


= ADF + Ags + AGO? + 4 AYC 


Where the A\'s are the eigenvalues of D®/(p), 

IF), > 0 for all /, then Q(0) = 0 is the minimum value of Q. IFA, 
i, then Q(0) = 0 is the maximum value of Q. 

IA, > 0 for all i, then the minimum value Q(X) = 0 is attained only for 

X = 0.If, for example, A, = 0, then Q((a, 0,0, .0)) = 0 for any a, so the mini- 
mum is attained on a whole line of points X, Similarly for maxima. In this ease 
Q = D°/(p) is singular and the second-derivative test fails. 

Isome A, > 0 and some A, < 0, then 0 = Q(0) is neither a maximum nor a 
minimum. In fact Q-.n) = A, > 0 and QC j-.n) = dj <0, We shall refer to this 
third case as a saddle. Thus we shall classify critical points into only three types: 
maxima, minima, and saddles 


0 for all 


EXAMPLE 5.12. The most difficult part of a max-min problem is finding the criti- 
cal points, The geometry of the following example makes the critical points rela- 
lively easy to locate. 


53 (Multivariate Caleulus) Optimization 319 


pic) = Fal + ieee: 0<x<2e 


sin x 


pio =[]+ ak 0<y<2e 


Asx and y vary, the points p,(x), p.() vary on the circles in Figure 5.11. Let 
x, 9) = lp) — p,(x)]|- Then clearly d has a minimum at (x,y) = (7,0) and a 
maximum at (x, ») = (0, 7). Let us check this with the second-derivative test. As 
usual with distances we avoid square roots by working with f(x,y) = d(x,»)# 


SY) = |iplB) — pill? = (4 + cos x — cos y)? + (sin x — sin yy? 
Dftx,») = 24 + cos x — cos y)(—sin x) + 2(sin x — sin y) cos. 
L2G + cos x = cos yXXsin y) + 2Gsin x — sin y)( cos y) 
—4 sin. x + sin(x — yp) 
4 sin y — sin (x — ») 


cos xX + COS(X — ¥) cos (x = y) 


a 
fet hie 
Bie 2 —cos(x—)) cos» + cos (x —)) 


mreoerf, Sh ommre2[$ 


Both of these matrices can be diagonalized by a 45° rotation. Taking 


1-1 | 
ie a 
we obtain 


OTD*f/(x, 0) = 2(3 Al OTDY/(0. 70 


Co 6] 

0 -6 
.0) is a minimum, since the eigenvalues of 
point (0,7) is a maximum, since the eigen- 


‘As expected, the critical point ( 
D®f(z,0) are positive, and the eri 
values of D2/(0, 7) are negative. 


FIGURE 5.11 


320 Orthogonality 


There are two other critieal points for f(x,y): (0,0) and (=, =). 
a -3 -1 a = 5s -l 
po.) =2[7} ~}  Pxee=2[_) 


From Figure 5.11, both these critical points have the same character, since 
they give congruent figures, 


0-JACOB) A-2 2-3 “1-15 
0.993 -0. 122 
0.122 0.993 
(90)4.xAs xO 
3.1260 1736-18 
0 0060 5 1260 


The eigenvalues differ in sign; thus these two critical points are saddles. 


‘The fact that the critical points (0,0) and (z,=) in the example above are 
saddles can also be deduced from Figure 5.11 

If we fix p, at (3,0) =p,(0) and let p, vary in a small neighborhood of 
(1,0) = p,(0), then the length (y) = \ip,(0) — p4(3)|| has a local minimum at 
y = 0, On the other hand ifwe fix p, at (—1, 0 ) and let p, vary in a small 
neighborhood of (3, 0) = p,(0), then (x) = ljp,(x) — p,(0)|| has a local maximum 
atx =0. 

To rephrase: holding x fixed, y has a local minimum at (0, 0), whereas, hold- 
ing y fixed, x has a local maximum at (0, 0). 

This means that the curve y =O on the surface = = dx, ») = \lp\(x) — p00)}) 
has a maximum at (0,0), whereas the curve x = 0 on this surface has a local 
minimum. Thus the surface must be a saddle (Figure 5.12). 

More generally consider the coordinate curves obtained by letting each x, 
vary while the rest of the variables remain constant. If some of these curves have 


FIGURE 5.12 


53° (Multivariate Calculus) Optimization 321 


maxima and others haye minima at the critical points, then the critical point must 
be a saddle. In general, however, this is all that can be said without diagonalizing 
the second derivative of the critical point, It can happen that these curves all have 
minima, say, and the point is nonetheless a saddle (exercise 1). 

The use of JACOBI in Example 5.12 was not necessary. Proposition 4.11 
could have been used, since only two variables were involved. JACOBI is needed 
for three or more variables, however. 


Exampte 5.13 Verify that the point (1,0, —1, 0, 2,2) is a critical point of the 
function 


+t 1] 


What is the character (ie., max, min, saddle?) of this critical point? 


Salution 


2x = 2) + 2u(x — 2) 


Ay —w) + Quy 
-_ |— 2x = 2) + le + 2) 
PL= | acy — wy + 2ow 

(= 2/8497 =1 

(2 +284 wt=1 


and D/(1,0, 1,0, 2, 2) = 0, so we do have a critical point 


t+u 0 =I 0 x-2 0 
O ltu 0 =I y 0 
-! O tev 0 OO 242 
0 =! 0 I+e 0 w 
x= y 0 o 0 0 
0 0 242 w 0 0 


A 
2 0-1 0-1 0 
0 3 071 0 
1030014 
o-1 0300 
100000 
oo 1000 


322 Orthogonality 


(80) + «A+ xO 


2 4160 0 00e0 2.48616 0.0060 2ATE18 1.196718 
0 000 2.0060 00060 3.47£-18 0 00E0 0.000 
2.45616 0 00£0 4 2460 0 o0e0 2086-18 2.206-18 
© 0060 1736-18 0. 00£0 4.0060 0. 00£0 0.000 
1.7318 0 000 8.67E-19 0. 00£0 4.14E-1 2. S0E-16 
8 67E 19 0 DEO 217618 0.0060 2.50616 ~2.366°1 


Since there are both positive and negative eigenvalues, the point is a saddle 
point. = 


EXERCISES 5.3* 


1 foxy) — 3xy + y%, being quadratic, has p = (0,0) as its only critical point. 
(a) Show that the finetions A(x) = /Lx.0). g(9) = 0.) both have minima at 
(0,0) 

(b) Show that (0,0) is a saddle point of / 
2, Let yx), Pal) pyl2) be the indicated vectors in Figure $.13, 0 < x, y, <2. Let 


Nave 


PONE + PADI? + lasedi? 
(a) Show that 
Ml 9.2) = 15 + Moos x + sin y — cos 2) 


(b) Compute Df and find the critical points of /. 
(©) Compute DY/ and classify the critical points of f 


3, (Computer assignment) Let /be the function of Example 5.13. Verify that the follow- 
ing points are critical points of fand determine their character. 


(a) (3,0, =3,0, -6, -6) 
(b) (1,0, =3,0,4,4) 
(©) 3,0, 1,0, 4, —4) 


FIGURE 5.13 


4 Perpendicular Projections and Least Squares 323 


5.4 Perpendicular Projections and Least Squares 


In this section we exhibit the connection between the ideas of perpendicularity 
and shortest distance. It is an elaboration of the theorem from plane geometry 
that the shortest distance from a point to a line is attained at the foot of the 
perpendicular dropped from the point to the line. There is a close connection 
between these ideas, the least-squares approximations treated in Section 2.3 and 
the left inverse computed by 8. 

We begin with the idea of breaking a vector into two components, one parallel 
to and one perpendicular to a given flat. First some notation. 


DEFINITION 5.9_Let S be a set of vectors in R". The set of vectors in R” perpen 
dicular to S is denoted by S*. If S is a subspace of R", then S* is called the 
orthogonal complement of S. 

We use the notation (vy, Us.---. Uy} to denote the set of ve 
Uy Ug ---+Uq: HES is the empty set, we take S* = Rr. 


ors consisting of 


Proposition 5.13 The set of vectors S* is always a subspace of R". If 8 is a 
subspace of R" and $ is generated by vj, Us... Ug, then S* = (04. Ug. -. 4 Uy) 


Proof We use Proposition 4,24 to verify that S* is a subspace. The zero vector is 
in S*, since the zero vector is perpendicular to all vectors. 

A vector v is in S* if and only if v+s =0 for every s in S, Writing v and » 
as column vectors, this becomes u?s = 0 and 


(avs = ats 
(u + wits = (uh + why 
= ols + why 


so if v and w are in S*, so is av and v + w. Last, suppose that A[;/] = oy. Then § 
is the column space of A and v 4 v, for / = 1,2,...,4 if and only if ohA = 0. 
Suppose that v is in (vy.-..t,}* and s is in S, Then y= AT for some 
T= (yt -+e 8008s = UTAT = 0+ T = Oand visin S*, Conversely, if vis in 
S*, then vA = 0, since each v, isin S. 


the column space of A. Then v is in S* if and only if 
0. This means that S* is the null space of A” and 
in reduction, 


Suppose that S 
vTA =0 or, alternately, Ae 
hence can be computed by 


Exampce 5.14 Compute S 


{1 1,5, 9. (=2, —1, =7, —13), (= =H, —19)} 


324 Orthogonality 


Solution 


A 
Fick 
1 <3 

Tow 

419 

+ECECHELON DA 

1.000 0.000 2.000 4 000 
0.000 1.000 3 000 5.000 
0.000 0.000 0 000 0 000 


‘The null space is given by 


=a -4 
me 3 
Gaile aaiiea 
0 1 


Thi * is the subspace with basis (—2, -3, 1,0) and (—4,-3,0.1). 


If Sis the column space of A, then dim S = rank A = rank A? = 
n= dim (null space A’) by Propositions 4.26 and 4.28. Thus dim(S) + 
dim (S*) =n, Notice that Sand S* intersect in a point, the ongin. This is because 
4 vector v in both S and S* is perpendicular to itself: that is, v'v = 0 and hence 
v=0, 

‘The next proposition shows that one may always break a vector into compo- 
nents parallel and perpendicular to a given flat or subspace. 


PRropositioN 5.14 Let S be a subspace of R* and let v be a vector in R*. There 
are unique vectors v! in Sand v* in S* such that 


Proof Let the columns of A be « basis for Sand the columns of B a basis for S* 
We wish to find vo! = AX and v4 = BY such that 


v=ol +o! =AX + BY = ta\ai[*| 


Since dim (S) + dim ($+) =n, the matrix [4|B] is n by n. If [A] B] is invertible, 
then the above equation has a unique solution and the proposition is proved. To 
show [4B] invertible it is sufficient to show that [4 |B] has independent columns. 


54 Perpendicular Projections and Least Squares 325 


Suppose that [X, | ¥,J7 is a solution of the corresponding homogeneous equa- 
tion, Then 


O= | = AN, + BY, 


% 
and hence AX, = —BY,, This means that the vector v = AX = B(—¥,) is in 
both S and S*. Hence v = 0, which means that 


AXy =0, BY, =0 


But the matrices A and B haye independent columns, hence X, = 0and ¥, = 0. 
We have shown that the homogeneous equation has only the trivial solution, so 
{4)B] has independent columns. « 


Derinimion 5.10 With the notation of Proposition 5.14, vu! is the component of v 
parallel 10 S: v* is the component of v perpendicular to S. The vector v' is also 
called the perpendicular projection of v onto S. 

In the plane and in space v! is the foot of the perpendicular dropped from the 
endpoint of the vector v to the subspace 5 (Figure 5.14). 


Prorosirion 5.15. Let S be a subspace in R®, v a vector in R*. The unique vector 
in S that is closest to v is u', the perpendicular projection of v onto S. 


Proof Let w be any vector in S. Notice that o' — w in S implies (0! — w) Lv! 


d(v, wy? = |v — wi? = |v — ot +o! — wif? = fot + oF — wy? 
= |\v* |? + je’ — wif? by the Pythagorean theorem (Proposition 5.2) 


> heh? 


and we get equality only when o!—w=0, 


Suppose that $ is the column space of A. To compute v! we could proceed in 
= AX for some X, Since v* is perpendicu- 


FIGURE 5.14 


326 Orthogonality 


lar to S, ATv* = 0, Since w+ = v — v', we have that u' = AX for any solution X of 


AT(v — AX) =0 
or 


ATAX = Ato 


This gives an algebraic proof of Proposition 2.15. With this approach we do not 
need to assume that 4 has a left inverse. Although o! is unique given v and S, the 
equation o' = AX may have many solutions if A does not have linearly indepen- 
dent columns, Note that the function (B — AX) +.* 2 of Proposition 2.15 is just 
B — AX |. 


PROPOSITION 5.16 Let A be a matrix, Ba vector, and set /(X) = |B — AX), The 
function f has a minimum at any point X satisfying the equation 


ATAX =ATB 


In particular if A has linearly independent columns, then vu! is just 
A+xvBa. 

EXAMPLE 5.15. Let S be the subspace of R® generated by (1, 1,0,2,3) and 
(3.2, =7, 1.0). Let u = 15. Write v = v! + 0+ What is the distance from v to S? 


Solution Let us denote v! by prosy and v* by Perey. Then 


Aum? 501102392 710 


+PROUV=AY «(Voi SDA 

0.795 117 259 3.44 5.72 
+PERPV.-V-PROJV 

0 205 0.835 0.406 0.558 ~0 719 
‘DIST=(PERPV+ «2)+-2 

132 . 


ACCOUNTING FOR VARIATION 


Let us take a closer look at how these geometric ideas involving perpendicular 
Projections and shortest distances relate to the least-squares curve-fitting caleula- 
tions of Section 2.3. 

Suppose that some characteristic of a group of individuals is measured, result- 


54 Perpendicular Projections and Least Squares 327 


ing in a vector of values ¥ = (y,,) .y,)- For example, y, might be the weight 
of the ith individual. The numbers y,, of course, vary from person to person, and 
‘we Want to “account for” this variation. More precisely, we wish to find out “how 
much” of the variation in weight is “due to” variation in height, 

First we need a measure of the “variation.” The standard approach is to take 
the standard deviation or variance. Let ¥ be the mean of the y's. Then the 
variance is 


Wy — Yi 
” 


Chat 


where ¥ = npy. The n in the denominator is just a scaling factor here and is 
usually omitted, The measure of variation is the total sum of squares: 


total SSQ = /¥ — YF =ZCy, 


In Section 2,3, recall that we discussed fitting a polynomial of degree k, 
y = Cy + OX + ~~ + G|x*, to measured data points (x,,9,) (cf Proposition 
2.16). If k =0, the resulting constant polynomial is y =Y. In fact the system 
AX = B to be solved is 


Multiplying through by 4? =[1.. 1}, we obtain 
(hob. AM fel = Ee fy 


1 Ye 


or 


‘This means that, in R", ¥ = Acy is the perpendicular projection of Y onto the 
column space of A =(n, 1)y1 (Figure 5.15)+ In particular, (Y= Y) 1 ¥ 

Now suppose that the weights of the individuals are X = (4,3, ..5.¥,). To 
find the least-squares line y = cy + ¢,.€ expressing weight as a function of height 


Next time you see someone calculating averages, be sure to tell him this. 


328 Orthogonality 


FIGURE 5.15 


We find the least-squares solution of AC = ¥, where C = (¢y, c,) and the columns 
of A are mpl and X. Then ¥ = AC is the perpendicular projection of ¥ onto the 
column space of A (Figure 5.16) 

Now YL = ¥— ¥ is perpendicular to all vectors in S. In particular, 
(Y¥ — ¥) 1 (¥ — ¥), and hence by the Pythagorean theorem 


total SSO = |Y — ¥ WY — Fye + wy — Pie 


‘The quantity || ¥ — ¥|j?is the portion of the total sum of squares “due to” X and 
is called the regression sum of squares. The remaining portion of the total sum 
of' squares, ||¥ — Yj is the “unexplained” portion and is called the residual sum 
of squares. 


Total SSQ = regression SSO + residual SSQ 
Dividing through by the total sum of squares gives 


jo lFare re 
Wy =¥PF “= PF 


The first term is said 0 give the fraction of the total SSQ that is attributable to 
variation in X and the second to give the fraction of the total $$ left unex- 
plained by X. 

Incidentally, reasoning geometrically from Figure 5.16 can be justified. We 
will see shortly that any flat may be coordinatized with an orthonormal coordinate 


FIGURE 5.16 


$4 Perpendicular Projections and Least Squares 329 


system. Putting an orthonormal coordinate system on the flat defined by the ori- 
gin, nyl, ¥, and X turns it into a copy of Euclidean space. 


Exampte 5.16 Table 5.1 contains the weights (in kilograms) and heights (in 
inches) of 15 boys aged 9 to 11 engaged in competitive swimming. (The data are 
extracted from Table 1.3 in Exercises 1.3.) What percentage of the variation in 
weight in this group is attributable to variation in height? 


Solution Let ¥ be the vector of weights and X the vector of heights. Doggedly 
applying Proposition 2.16, we have 


*TOTSSO=(¥-YBX* *0..0)+. +2 


473 
+RESSSO~(Y-A> *YDALX*. +0,.4)+ #2 
218 
+ REGSSQ~TOTSSQ-RESSSO 
255 
100x (REGSSO, RESSSO)-TOTSSO 
53.9 46.1 


‘Thus $4 percent of the variation in weight for this group can be attributed to 
variation in height, leaving 46 percent “unexplained” by variation in height 
alone, 


Figure 5.16 gives more information. . 
Let I be the line through 0 and np 1, The vector ¥ — ¥ is perpendicular to / 
by construction, and, since ¥ — Fis perpendicular to all of S, it is in particular 


TABL 


Height | Weight 


| 2530 
35.40 
| 33.60 

36.90 
| 2860 
30.40 
| 29.10 
30.90 
24.10 


330. Orthogonality 


perpendicular to /. Thus the plane of the triangle containing these two vectors is 
perpendicular to /, and in particular 


(F-Pyat 


Now if isa vector in R” and Zis the average of the components of =, then the 
foot of the perpendicular to fis mez. Thus the fact that ¥ — Yis perpendicular to 
J means that the average of the components of F is F. 

If we let ¥ denote the average of the components of X and set. = mp X, it 
follows that X — ¥ is parallel to ¥ — ¥; since they lie in the same plane as and 
are both perpendicular to J. Hence, if@ is the angle between X — Nand Y¥ — ¥, 


regression SSQ__ /\¥ — Yil\* 
= Fi 


total SSO W 


=e =F, 


where r,, is the .cy correlation coefficient (ef, Section 5.1). 
Thus, since r,, is symmetric in x and y, we have 


fraction of variation in y due to x = 
= fraction in variation of x due to y 


Proposition 5.16 gives one method of computing perpendicular projections. We 
will now develop an alternate approach that yields a method of constructing 
orthonormal coordinate systems as well, The method works when ATA is singular 
In this ease 8 produces a oowain ERROR 

IFS is the column space of A and v isa vector in R®. then v', the perpendicular 
projection of » onto S 1s computed by first getting any solution, , of 


ATAX = Ate 


and then setting 


Now if the columns of 4 are orthonormal, then ATA = 1D n, and so ATAN = ATu 
reduces to ¥ = 4%, Thus w = Av and v! = Ado. 


Proposition 5.17 Let the columns of U form an orthonormal basis of the 
subspace S of R® Let v be a vector in R®. Then 


the component of & parallel to $ is = UUte 


1 — UUT 


the component of v perpendicular to S is 


where =/Dn 


To apply this proposition we need a method of constructing orthonormal 


54 Perpendicular Projections and Least Squares 331 


‘bases of a subspace. In concrete terms this means that given a matrix V we wish to 
construct a matrix U with orthonormal colunins and the same column space as V. 
It is quite useful to be able to do this in such a way that the column space of 
V{;.k), the first k columns of V, is contained in the column space of U[;.k} the 
first k columns of U. We do not demand that these column spaces be equal 
because we wish to allow V to have linearly dependent columns. 

‘The method of constructing U described below is a variant of the Gram- 
Schmidt process and is intended for hand computation. A method more suited to 
the requirements of machine computation, the Householder algorithm, is devel- 
oped in Section 5.5, where it is used to define an APL function ORTHO that 
returns U given ¥. 


THE GRAM-SCHMIDT PROCESS 


Given a matrix V, we first construct a matrix W/ with orthogonal columns such 
that the column space of V[;,4] is contained in the column space of W{;,k]. We 
then define the final matrix U by Uf{si] = [:/]/\|W{ii))). In what follows tet 
Y ih we, = Wi}, uw, = UGA} 

Start with w, = vy. 

Next we need to choose wy 1 v4, such that {v, |.) and [iy |) have the same 
column space. Let 5, be the subspace generated by v,. Write U, =v} + vf with 
respect to S and take w., = v5 [Figure 5.17(4)}. Then 04, U, and wv, W, span the 
same subspace and w,, 1 w,. To get w, we let S, be the column space of fw, [ws] OF 
[v,Jorh write v, = vf + v5 with respect to S, [Figure 5.17(b)}, and set Ww 

In general suppose that W,,W,.--. have been constructed such that 
fw, Jw]. [o94) and [o,]0,|...|ug] have the same column space and w, Lm, 
for i#j- Let S, be the column space of [w;|w,|...|™] and write 
Cee = Cher + Uke With respect to this subspace. Take w,,, = 


kev 


Craw [vj )02)0-|eelecea) and fy Joes]--. Jy fey} have the same column 
space. 

Proof By induction assume that the column spaces of [0 ||... [ty] = Mand 
ov [oe |. [ayy] = M4 are equal. We must show that u,., is in the column space 


of [M4 )yy.,] and that w,,, 18 in the column space of [Vj |0.4} 


4/4 


Z: 


(b) 


3382 Orthogonality 54 Perpendicular Projections and Least Squares 333 


Since vj, is in S,, there are vectors T,, 7; such that Exampe 5.17 Apply the Gram-Schmidt process to the three vectors u, = 
(1,0, 1), 0 = (2,13), 05 = (1,1, 1). 


Hence Solution w, = 


Py, SO We start with 


W= 

Coo ther t me MME] | 
Then 

Suppose that ¥ has n columns. After n steps this process yields a matrix H’ 
with orthogonal columns and the same column space as V. 

Now suppose that vs, for example, had been a linear combination of v, and | 
vy Then [see Figure 5.17(b)] vy would lie in Sy and we would get vy = v5. = 
0. Thus if V had linearly dependent columns, then W would have 
zero columns, The rank of Wis the rank of IVTHY, which is a diagonal matrix 
(WW) i: i] = \w,||® Thus the rank of Wis the number of nonzero columns, 

\d deleting the zero columns from HW (m-¢ov. ¢W) /w) gives an orthogonal basis 
for the column space of V. Hence setting «, = ¥,/|}Wl| gives an orthonormal basis 
of the column space of ¥. | 


‘ant oO 1 
-|o 
1 


115) 


Now consider the details of the computation. Suppose that W), Wy... 
have been constructed, Let W = [w;|w|--. [wy} Then 

Weer = Uae — Yeon 
where i 


ob = WT 
and Tix any solution of or 


WIWNX = WT 


The matrix WH is diagonal, If we discard zero vectors w, as they occur, then 
WTW is invertible and (WTH)-" is trivial to calculate, since the inverse of a Raxt 
diagonal matrix is obtained by inverting the diagonal entries. In this case a 


Weer = Yee, — WWW WE i ky 


Notice, however, that any nonzero multiple of ,,, can be substituted for w,.,. ee 
This will change no perpendicularity relations, and w,_, will be normalized in the vy — WWE WY TO, 


end anyway, This means that at any stage of the computation we are free to 1) ft -1ypa op 1 0 nyt 
= |-[ 0 ila a 
1 1 1 


multiply 2 


2 


Oke — WOT WTo 1 


by any number that will eliminate unwanted fractions. }For “J” read “is parallel to." 


334 Orthogonality 


1-1 
lo 2 
Pe 73 


(: a 


2 
4 


So 

1 
i VE 
2 | and U=| 0 


he 
v2 


se 


7 


“4 


54 Perpendicular Projections and Least Squares 335 


" 
ooae 
! 
fi 
Wi 
fl 
ee aw 


é 


Taking 


3 
Vi uVIT 


\ 
2Vvi7 


Exampte 5.18 Let S be the subspace of R* generated by (1, —1, 1, —1), 
(1,0, 0, 1), (1,2, 0,0), What is the perpendicular distance from v = (1, 2, 3, 4) to 
S? What is the yector in S closest to u? Same questions for (0, 1. 1, 0). 


Solution Writing v =o! + 04, the distance is |jo*|) and the closest vector is v! 
We use the Gram-Schmidt process to compute a matrix U whose columns are an 
' VAI 
of Jat 
OSV ile er 
all Jet 


orthonormal basis of 
1 =ufiy fe 
0 ~ 0 
0] Jo 
Cee 
by — W(WTW WT, 


+- ‘ale alls “oo oi 


AEs 


h-hh =) 
Wy = Uy — WWW) HT, 


ifs Is aH ‘ie 


we have that v' 


uh 

2 

1 7 

2 2Vi7 
! 

2 

1 

2 


esl 
2V17 


UUT» for any v in R by Proposition 5.17. Computing both 


projections at once, we have 


Le} Jes} = UUMo, |v.) 
io 1 
2 VE 2 - 
= 0 0 21 
6 —. 31 
2 2Vi7 
int 
BS eels 40 
2 v2 
dy elt ath 
2 7 
S = Aes 
7 
= Zh 
"7 
fa 2264 
17 


336 Orthogonality 


so 
(1, 2, 3, 4)’ = d4(80, 45, — 13, 5), |\(1, 2, 3, 4)*]] ~ 5.35 
(0.1, 1,0)! = 346, 14, 2, —6), (0, 1, 10)* | ~ 1.03 . 


From time to time we have encountered reflections in the plane and space. In 
Section 5.5 we will need the concept of reflection in a subspace of R®. Notice that 
the following definition for subspaces coincides with the usual concept when 
n= 2, 3 (Figure 5.18). 


DEFINITION 5.11 Let S be a subspace of R" and let v be a vector in R*. The 
reflection Ri(v) of v in S is Rw) =v — 20" 


Propostrion 5.18 Let S be a subspace of R" and let the columns of U form an 
orthonormal basis for S. Then perpendicular projection onto S and reflection in S 
are linear transformations with matrices 


Py= UU Ry = 


Py—l 


respectively. Further 


Proof By Proposition $,17, vo! = UUT» and vo! = (UI — UU) for any v in R 
Thus perpendicular projection, being multiplication by a matrix, is linear. 
Rev) = v — 2 = [1 — 2 — Py =(2Pp— Dv, 


Notice that there are in general an infinite number of choices for U but only 
one Py. The projection and reflection matrices have some special properties that 
are geometrically evident in R® and R® (apply the transformation twice). 

The projection matrices satisty 


FIGURE 5.18 


54 Perpendicular Projections and Least Squares 337 


and the reflection matrices satisfy 


The verification of these facts is left as exercise 32. 

So far we have discussed projections and reflections only for subspaces when 
clearly the concepts should apply to an arbitrary flat. If we want to discuss projec- 
tion onto a flat F; we just translate to the origin, perform the projection, and 
translate back. 

To be specific, suppose that p is a point in F: Let 5 be the unique flat through 
the origin parallel to F, To project X onto F, we translate F and X by ~p, perform 
the projection, and translate back. The resulting transformation is affine. In fact 


Y=ap+ P(X —p) 
=(p — Pep) + PX 
= Pup + PyX 
= pe + PX 


Notice that ify is another point in F, then p —q is in S, and so 
Pe = Psy = Pol +P — 9 


Pag + Pylp = 4) 
+ +0 


So the resulting formula is independent of the choice of p (Figure 5.19), 
For reflection in F we have 


Yap + R(X —p) 
=(P — Ryp) + Rg 
= (1 = 2P, — Np + RX 


FIGURE S19 


338 Orthogonality 


Exampte 5.19 Let F be the plane in R® through the points py = (1.0.0), 
Px = (0, 1,0), and py = (0,0, 1). Give algebraic expressions for the affine fune- 


tions: 


perpendicular projection onto reflection in F 


Solwion The subspace $ parallel to F is generated by v; =p, — Py and 
Po, We use the Gram-Schmidt process to find an orthornormal basis 


=! 
wy =e 10, Ww 1 
0 
by — WWTW) WT 
—1}g—1 1 Of—1] f=t =U 
=| 1 0 0 1 
0 1 1 0. 
=2] f-1) f-1 
i} o}-—] if=}-1 
2 0 2 
So 
-1 -1 
Ww W =) 
0 2 
gives an orthogonal basis of S and 
oe al 
vz V6 
Sia 
v2 
0 


gives an orthonormal basis of S, Thus 


P, = UUT = W(WTW) WT = 


0 


Al = ills 


e =i Wr 2] 


0 


54 Perpendicular Projections and Least Squares 339 


—) =I Es 30 
Saer —1i lt) ra: 
of "2, 
4 =2 <2 2 af 3 
=a/-2 4 -2/=4/-1 2 -1 
-2 -2 4 =) =I 2 
1 2 3 z 1 
Po = Po — Pspo = 10) —4]—1] = 4{] 0] - -') =): 
0 =1 o) tea 1 
and 
300 
-j0 30 
003 


1 2 =1 =1 
(| +]-1 2 -1 \ 
Wo ol-1 -1 2 


inF 


The vector p* = Py.p in Figure 5.19 is the point in F closest to the origin. We 
record this useful fact for future refer The proof is left as exercise 33, 


¢ a K-flat in R” and let S-be the unique k-flat through = 
st to the origin is p> = Py. p, where p 


PRopostti0N 5.19 L 
the origin parallel to & The point in F clos 
is any point ine 


340 Orthogonality 


EXERCISES 5.4 


In exercises | through 10 a set S is given, Find a basis of S* 


LS =((.2) inR® 2 S=((a,b)) in R® 
3, S=R" 4. 5=(0)in RY 

5. $= (1-2, -4#).(, <b) in RE 

6 S=((—h —h =D,(-2,3,5.7)(—2,0, 4, —8)} in Rt 

i (11,0), (=2, 1,3) (= 1,0.) in Rt 

8. {Ch = 1,0), (2, 3,5, 19.(0, 1,3,2)} in RE 

9 {(, =1, =2,0, =1).(1,0,0, 1.8), (=2, 3, 6,2, 16), 


(0, —2, =4, -3, =23)} in R® 

10. Sis the set of solutions of 

r=0 
=0 


Kty- 
dv + 4y 


I. (a) Let S be a subspace of R®. Show that (S*)* = S. 
(b) Use (a) to solve exercise 10, 

12, Use exercise 11 to show that if S is the solution space of a system of homogeneous 

linear equations, then 5+ is the space generated by the coefficient vectors 


In exercises 13 through 17 a matrix A and a vector & are given. Use Proposition 5.16 to find 
the components of v parallel and perpendicular to the column space of A. What is the 
distance from v to the column space of A? 


1 
3, v=(1)) Al 
1 1 
M4 L29,4=]1 =I 
1 1 
tt 
18. v=(,0..4=0 1 
o4 
129 
16, (Computer assignment) » =(1,—1 1. = |} tO 
-1 3 4 
1-3 2 
17. (Computer assignment) v Pa pene 
1 46 


(Computer assignment) Exercises 18, 19, and 20 refer to Table 1.3 of Exercises 1.3. 


18. What percentage of the posttest weight variation of the experimental group can be 
attributed 10 posttest height variation? 


54 Perpendicular Projections and Least Squares 341 


19. What percentage of the pretest weight variation of the control group can be attrib- 
uted to pretest height variation? 


20. What percentage of the posttest weight variation of the control group can be attrib 
‘uted to posttest height variation? 
Apply the Gram-Schmidt process to the sets of vectors given in exercises 21 through 25 


21 (1.0), 1) 22. (1, D.(2,2,2,3) 
23, (1, 1,0), (1,0, 1), 0.1, 1) 24. (1,01), (1, 1.0), 0.1, 1)) 


25. (11, =1. 1, =,.01,0, 1,0), 0, 1.0, DL) 


In exercises 26 through 31 a subspace Sand a vector v are given. Compute the matrices I’, 
Pg., and Ry, Use these matrices to compute v', v4, and the distance from v to S. 


26, v =(1, 1), S generated by (1.2) 
27. w= (1,1), S generated by (1, —1) 

28, v= (1.1), S generated by (3, —3) 

29. v =(1.2,3), S generated by (1,0,0), (1.1, 1) 
0. (1,2, 3), S generated by (1,1, 1), (1.0, 1) 


1, = 1) 0,1,.0, 1), 0,0, 1) 
Ry = Rand Ry 


31. v= (1.0.1.0), 5 generated by (1, ~ 
32. Let § be a subspace of R*. Show Py = PE, P} = 
33, Prove Proposition 5,19, 


Hint; Let q be another point in the flat and show that the Pythagorean theorem 
applies to the triangle with sides p'. q — p*, and q. 


Ry 


In exercises 34 through 49 a k-flat Fin R* is given. Find expressions for the affine fune- 

tions consisting of perpendicular projection onto F and reflection in F- What is the point in 

F closest to the origin? 

34, The line in R? through the points (~1,0) and (0, —1) 

35, ‘The line x + y= 1 in R® 

36, The line through (—1, 1) and (0,3) 

37, The point (1, 1). (A point is a 0-fat.) 
Hint: The subspace ((0, 0)) is the column space of U = 2 Op0, (Just go to the termi- 
nal and type (2 0»0)+. «+0 if'you don’t believe it) Further, since UU = 1D 0, we 
take U to have “orthonormal columns.” 


38. The plane in R¥ through (1, 0,0), (0, —1,0), and (0,0, —1). 
39. The plane in R® through (1.4, 1), (—6, 12, 1). and (7.7, 1) 

40. The line in R® through (1, 1, 1) and (1, 1.0) 

41. The line in R* through (1,0, 1) and (0, —1, 1) 

42. The point (6, —7.8) in R® (see the hint for exercise 37) 

43, ‘The 2-flat in RY through (0,0, 1. 1), (1, 1,1, =), and (1,0, 1,0) 
44. The 2-flat in R¥ through (1,0, 1,0), (—1, 1, —1. 1), and (1.0.0, 1) 
45. The J-flat in R* through (1,0, —1, 1) and (1, —1. 1,0) 


342 Orthogonality 


46. The I-flat in R* through (1, 1, 1,1) and (0, 1,0, 1) 

47, The 3-fat in R* through (1,0, 0.0), (0, 1.0.0), (0,0, 1,0), and (0,0, 0, 1) 

48, The 3-flat in R* through (—1,0,0, 1), (1,0, —1,0), (1, =1.0.0), and (0,0, 1, —1) 
49. The O-flat (1, 7,6, ~10) in R" (see the hint for exercise 37) 


5.5 The Householder Algorithm 
(Automatic Orthonormalization) 


The Gram-Schmidt algorithm introduced in Section 5.4 is unreliable in the pres- 
ence of round-off error (see exercise 1), For this reason another algorithm, due to 
A. S, Householder, is often used for machine computations. In this section we 
develop the Householder algorithm and use it to define APL functions for the 
computation of orthonormal bases and perpendicular projections. 

We begin by looking at the Gram-Schmidt process from a different angle. 
Given a set of vectors, let us store them as the columns of a matrix A. Applying the 
Gram-Schmidt process to the columns of A produces a matrix U with several nice 
properties. The column spaces of U and A are the same, and the columns of U 
form an orthonormal set of vectors; UTU = I. Thus the columns of U provide an 
orthonormal basis for the column space of A. Often this is all that is needed, but 
more is true, The Ath column of A is in fact a linear combination of no more than 
the first & columns of U. (Recall that if the columns of 4 are not independent, then 
U will have fewer columns than A.) Suppose that 


ALK) = UlekIT, 


for some vector of coeffici 


ents T,. Define a matrix R by 
RUk| = rT 


Where r is the number of columns of U (= the rank of A). It follows that 


where 


The form exhibited for the matrix R is a weak version of a row echelon form, 
which we will call upper echelon, 


5.5 The Householder Algorithm 343 


Derinrrion 5.12 A matrix R is said to be in upper echelon form if 


1. The zero rows, if any, of R are at the bottom. 
Note: The first nonzero entry of each nonzero row will be called a pivot entry. 
2. Each pivot entry has only zeras below it in the same column. 


3. The pivot entry in a given row is to the right of the pivot entries in the rows 
above; that is, if li; /] and R{k; {] are pivot entries and 7 <A, then j <1. 


The term “pivot entry” is used to indicate that if one were to row-reduce R 
to row echelon form, the pivot entries would be used for pivots. An upper eche- 
Jon form differs from an echelon form in that the pivot entries need not be ones 
and they may have nonzero entries above them in the same column. A matrix has 
only one echelon form but may be row-reduced to many different upper echelon 
forms. 

Some special terminology is associated with the equation A = UR above. 


Derinrtion 5.13 Let A be a matrix. A QR factorization of A is an expression 
A=OR 


where Q has orthonormal columns and 2 is in upper echelon form, 
Note that if A = QR is a QR factorization, then multiplication by QF gives 


on 


OTOR = R 


Thus multiplication by QT “reduces A to upper echelon form.” By our definitions 
Q need not be invertible, however (it need not be square), and so R need not 
result from row-reducing A. In the Householder algorithm described below, an 
invertible Q is used. 


E 
ess was applied to the set of vectors v; = (1, 0, 1), &: 
producing the matrix 


AMPLE 5.20. In Example 5.17 of the previous section the Gram-Schmidt proc- 
(2, 1,3),and vy = (1.1, ), 


344 Orthogonality 


In this case we let the columns of A be v,, Uy. Uy. By the remark above, R = UTA. 


The Gram-Schmidt process produces OR factorizations. Conversely, any al- 
gorithm that produces QR factorizations can be used in place of the Gram- 
Schmidt process. 

‘The Houscholder algorithm constructs QR factorizations by a process similar 
to fow reduction, In fact, @ uses the process to solve linear equations, The idea is 
(0 use orthogonal matrices instead of elementary matrices to row-reduce a matrix 
to upper echelon form. To do this we need an orthogonal matrix that can be used 
in place of pivot matrices to clean up a column. 

First we describe how such an orthogonal matrix can be constructed. The 
construction results from the observation that if v and w are any two vectors in R® 
with [ol] = jjw||, then there is a reflection matrix R such that v= Rw and 
w = Ro, In fact, R can be taken to be a reflection in the subspace generated by 
0 + w [Figure 5.20(a)}, 

By Proposition 5.18 we may take R = 2UUT ~ 1, where U'isa single-column 
matrix with Uf;1] = (0 + w) + |v + wll 

‘The trick is to start with a vector v and set w = (\jv||,0,0,....0); then 
{jo\] = |i) and w = Rv has zeros in every component but the first. 


(a) (b) 
FIGURE 5.20 


SS The Householder Algorithm 345 


Given a matrix A, we take v = Afs!] and construct R in such a way that 
(vl).0,...,0) = Rv. Then 


el) 


We will call the reflection R a special reflection. Such reflections are members 
of a general class of reflections called elementary reflectors or Householder trans- 
formations. We will not define the more general class of reflections, since they will 
not be needed below. 

In practice we do not always take Uf;1] =(u + w)/\jv + wf. If v is close to 
‘igure 5.20(b)}, then jv + w will be small and dividing by jv + w/ may 
cause numerical problems. In this case it would be better to reflect in the subspace 
generated by v — w — that is, to reflect v into (—vjj,0,...0). 

Because of this problem the APL function SPRFLCTR defined below 
chooses R in such a way that the signs of Irv and Iw are the same unless 
10, in which case we take 0 << Irw. If 0 = 1) 0, take R = ID po, 


 Z-SPRFLCTA VL 
11] 2Z-1Dev 

[2] Le(vs #2) 002 

13] --((LeO)VLA tery 0 

(4) VET TeVE1}eLxCVE1 20)-VET <0 
{5} (ave wVeve #2)-2 


On line [1] the result is provisionally set equal to an identity matrix and on 
ine [2] £ is set equal to |\o. The function stops on line [3] if |) = 0 or if Liv is 
negligible compared to |v). 

Note that since 0 = 11 w, the operation » + waffects only ofl] by adding +1. 
to v[]}. The sign of L is chosen and v + w formed on line [4], On line [5] the 
matrix UUT is formed as an outer product of vectors 

We are now ready to describe the Householder algorithm for reducing a 
matrix to upper echelon form. As we did with Gaussian reduction, we will use a 
version of the algorithm that returns a matrix Q, orthogonal this time, such that 
OTA = Ris upper echelon. In this case, however, we are more interested in Q 
than in R, 


HousrHoLDER ALGoriTHM Let the matrix 4 be m by n. 


1. Delete any zero columns from A. If A is a zero matrix, take @ = /D m and 
stop. (Recall our convention that a matrix without rows or without columns is 
considered to be a zero matrix.) 


346 Orthogonality 


2. Now let Q, = SPRFLCTR Al;1}, The matrix Q, is orthogonal and Q, = OF. 
The matrix Q,4 has zeros in the first column below the 1:1 position. 


3. LetA=111Q +.x A. By induction on the number of rows we may as- 
sume that we have constructed an orthogonal matrix @ such that 
R=0'A 


is an upper echelon form. Define the matrix Q by 


1fo 
e=alifol 
and stop. 


To see that this construction works, notice first that QTA = R is upper eche- 
Jon, In fact, 


1 
R= oa-|t Loa 


IG 
f4 


(The possible presence of columns of zeros in the original A does not substantially 
affect this computation.) The matrix Q is orthogonal, since 


70 = a ral aol rol 
-btel bTe 
“lite 


=l. 


The description of the Householder algorithm given above translates immedi- 
ately into APL notation. In the function HHLDR given below, the scalar B sets 
the scale for fuzzy comparisons. That is, a number x is taken to be zero if x is 
negligible compared to B(B = B + x) 


5.5. The Householder Algorithm 


¥ QB HHLDR A :01.0 


347 


111 AR(BY 2B+A)/A 

12] O10 1p 

13] +(0=14pa)/0 

[4] QteSPAFLCTA AL. 1) 

15] Q-B HHLOR 1 1101+ xa 

16] Ot «(¢1+1700)11).1110.0 
v 


Line {1} deletes any columns from A that are negligible compared to B. The 
expression ((1+11»0)#1), {1]0.0 in line [6] constructs the matrix 


Ire 
bil 


For example, if Q is 


then 0, Q is 
iF 1 A 
03 4 
and (1+1790)1 i8 341 ory oo. Finally 1 0 0.(1) 0.018 
100 
012 
034 


In our applications the number B can be taken to be the largest magnitude in 
the matrix A. In this case a function such as 


© Q-HSHLOR A 

[11 Qe (i), |AyHNLOR A 
° 

is convenient. 


Exampce 5.21 Compute a QR factorization of 


: 1 
4 
Bh 
1 


Check the accuracy of your answer. 


38 Orthogonality 


Solution 


+O-HSHLOR A 
213 0.758 0.515 0.338 

853 0.162 -0.439 -0.292 
426 -0.535 0.713 -0 155 
213 0.336 0 184 0 899 


+A-(90)+ «A 


4 6960 2.1960 8 B2E0 
9 S4e-18 7.3160 7.3160 

B.67E-18 3 .B2E17 4. 16E-17 
5.64618 1.39617 2086-17 


Dor for the processor used is about 3€-15, so R can be taken to be an upper 
echelon form, To check the orthogonality of Q we see whether QF will do for O~ 


(1D 4)-O+ xna 


0 000 6 S1E19 1196-18 1 25E-18 
6.5119 1.73618 0.00€0 0.0060 

119618 0. 0060 0.0060 2.17619 
1.25618 0. 0060 2.\7E19 ~1. 736-18 


Since R was constructed as QT, this check is much the same as comparing 4 
to QR = QO". 


AO xR 
8.67E-18 0 00E0 6 946 18 
0. 00€0 0 946-18 1 396-17 
0.0060 6.94618 1 046-17 
0 00E0 1.99617 1 04e-17 0 ow 


We started this section with the problem of constructing an orthonormal basis 
of the column space of the matrix 4, The output of HHLDR, however. is a square 
matrix. The column space of the matrix Q = HSHLDR A is all of R" It is quite 
easy to extract an orthonormal basis of the column space of A from Q. 

Since Q is invertible, the matrices A and R have the same row space; in 
particular, they have the same rank (Propositions 4,25 and 4.26). Since R is in 
upper echelon form, its ranksis the number of nonzero rows. This is because it can 
be put into echelon form by pivoting on the pivot entries, and there is one pivot 
entry in each nonzero row. 

Let us apply these facts to the matrices of Example 5.21. The matrix R has 
rank 2. The last two rows can be taken to be zero, Thus, 


=a, 


R 


OR = 10,1041) 


55 The Householder Algorithm 349 


where Q, = Qf;.2]. The column space of A is thus contained in the column 
space of Ql;:2|. But these two spaces are both two-dimensional, hence 
they are equal. 

The argument is true in general. The rank of A is the number of nonzero rows 
of R. If the rank of A is r, then Q[;.7] = U is a matrix whose columns form an 
orthonormal basis of the column space of 4. The function ORTHO returns U 
given A. 


¥ ZORTHO A RB 
[1] BE re. [ARC (0A), Ven pA 0A 
[2] RA(S2-B HHLOR A)e xa 
13) Z-Zis 0 /(BrR)Y 48) 

v 


The rank of R is taken to be the number of rows that are not negligible 
compared to B, the largest magnitude in A. The expression A-((pA). 1 Yon A) 0A 
leaves A unchanged if A is a matrix but makes A into a single-column matrix if it 
is a vector. This is because ORTHO is sometimes more convenient to use if it 
accepts vector arguments. 


EXAMPLE 5.22 Let S be the column space of the matrix 4 of Example 5.21, 
Compute Ps, the perpendicular projection matrix for S, and Ry, the reflection 
matrix for S, Find vo! and v! with respect to 8 if v is 14 


Solution Using Proposition 5.18 of the previous section, 


A 


*U-ORTHO A 
213 0.758 
853 0 162 
426 0.535 
213 0.336 


#PS-u+. «RU 
0.621 0.304 -0.315 -0.209 
0.304 0.753 0.277 0.128 
0.315 0.277 0.468 0.270 
0.209 0.128 0.270 0.188 


380 Orthogonality 


‘As a check, we should have PZ = Py. 


PS-PS> PS 
0 00e0 1.90618 8.67E19 2.176719 
1306-18 1.736718 0000 0 00E0 
8 67E°19 0 00E0 4.94E-19 ~4. 946-19 
2.176719 0.000 4346-19 0 00£0 


For v! and p* we have 


avery 
1236 
=VPARL PSY «Vv 
0.551 3.15 2.72 1.49 
+VPERP-V.VPAR 
1.55 -1.18 0.276 2.51 


To check v+, use the fact that the dot product of o* with any column of 4 
should be zero. 


VPERPS =A 
5.98617 1 67E 16 2 BE 16 


The reflection matrix is 


HRB (10 4)-24PS 
0.241 0 609 0.629 0 418 
0.609 0 507 0 554 0 255 

0.629 0 554 ~0 064 0 541 
0.418 0.255 0.541 0.684 


As a check, we should have 


(1D 4)-AS+ «RS 


173618 4 55E-18 3.90E-18 1 52E 18 
455618 5 206-18 0 000 0 00e0 

3 906-18 0 o0e0 3,ATE-18 2 60E-18 
152-18 0 00€0 2.00618 “1 79618 


What about the rows of @ = HSHLDR A that are not used by ORTHO? 
They form an orthonormal basis of the orthogonal complement of the column 
space of A. We collect these facts together for easy reference in the next proposi- 
tion. 


PRoPositioN 5.20: Let 4 be a matrix of rank r with column space S. Suppose that 
A = QR, where @ is orthogonal and Ris in upper echelon form. Then the first r 


5.5. The Householder Algorithm 351 


columns of @ form an orthonormal basis of S and the remaining columns form an 
orthonormal basis of S+, the orthogonal complement of S. 


Proof Let A be m by n, Then Q is m by m with OT = Q-1. The rank of Q is m, 
hence the column space of Q is all of R®. Since Q7Q = /, the columns of O form 
an orthonormal basis of R". Choose an arbitrary subset, Q[; J, of the columns of 
Q. Let the column space of Ol;V'] be T: Let 7’ be the space generated by the 
remaining columns. Then 7’ is contained in T4, since each generator of T" is 
contained in T*, But T’ and T* have the same dimension, since dim T + 
dim T’ = m, so T’ = T+. 

Thus, to finish the proof of the proposition it only remains to identify the 
subspace S with Of;«r]. As indicated above, R and A have the same row space by 
Proposition 4.26 and hence the same rank r by Proposition 4.27, 

By comparing the definitions of echelon form and upper echelon form we see 
that R can be row-reduced to echelon form by pivoting on each pivot entry and 
then dividing the row by the pivot entry to obtain a leading 1. 

Thus 


rank K = number of pivot columns 


number of leading I's 
= number of pivot entries 
= number of nonzero rows of 


Since the zero rows of R are all at the bottom, 


4 = 0R =[0hen/-1[ 91] = ofssnatir) 


In particular the columns of 4 are contained in the column space of Q[;«r] and 
hence S is contained in this column space. Since both spaces have dimension r, 
they are equal, 


In the notation of Proposition 5.13, the function PERP below returns a matrix 
whose columns form an orthonormal basis of $+. The function differs from 
ORTHO only in line [3] which drops the first r columns off Q 


vo Z-PERP A ;A:B 
(1) Ey (An (pA) 419A) 9A 
[2] Re(%Z-B HHLOA A) ¥A 
(3) 2-(0.#/ (B+A)Y. 2B)4Z 

. 


The uses of ORTHO and PERP are summarized in the nest proposition. Notice 
that the null space of 4 is (row space of AY 


352 Orthogonality 


Proposition 5.21 Let A be a matrix, To compute an orthonormal basis of the 
given space, use the given expression: 


Column space of A ORTHO A 
(Column space of A) PERP A 

Row space of A ORTHO @A 

Null space of A PERP GA 8 


LEAST-SQUARES SOLUTIONS 


We close this section with a description of the algorithm used by the dyadic 
function 8 to solve systems of linear equations, 
‘The equation 


AX =B 


has solutions if and only if the vector B lies in the column space of the matrix A 
(Proposition 2.12). 
A least-squares solution is & vector ¥ that minimizes 


/{X) =(B N)s 


|B — AX |* 


See Proposition 2,15.) That is, the distance from AV to B is minimal. Since the 
ctors of the form AV make up the column space of A, it follows that a least- 
squares solution is a solution to 


AX = B= PB 


where S is the column space of A. Let U = ORTHO A. Then P, = UUT, so we 
want a solution of 


AX = 


JUTB 


Now suppose that R = UTA, Q = HSHLDR A, V = PERP A. Then Ris upper 
echelon, V7 = 0, and Q = [U|V]. Multiplying the equation AX = UUTB by QT 
will not change the solutions, since Q is invertible (Proposition 2.13). 


AX = UUTB 
QTAX = QTUUTB 
(U|V FAX =[U\VFuUTB 


[ss = [ UTUUTB 


VTAX | ~ LVTUUTB. 


[*)= [4] 


SS. The Householder Algorithm 353 


or 
RX = UTB 


since UTU = I and V7U = 0. Thus the least-squares solutions of AN = B are the 
solutions of RY = UTB, the equation we get if we simply multiply 4X = B by U?. 

Domino does not compute U explicitly, however. First the augmented matrix 
[4/2] is formed, and the Householder algorithm is applied to the augmented 
matrix until the matrix is reduced to upper echelon form. This is equivalent to 
multiplying the augmented matrix by Q7 


ut 
QA |B] = (4)2] 


vr 
(2 UB 
~Lvtal ve 


(‘ UTB 

10) vt 

The lower portions, the 0 and V7R, can simply be discarded and we have the 
augmented matrix 


{R)UTB) 


It is not difficult to solve RY = U"B by completing the reduction of R to 
echelon form. If R is invertible, the system Is 


imaea ris 


where ry, # 0 for all £. In this ease one can start at the bottom with a pivot on r,,, 
then @ pivot on r-1,,-1 and so on, This process is called back-substitution. 

If R is not invertible, then 8 assumes the original matrix to be “ill condi- 
tioned” and returns a 00vAIN ERROR. 

Nothing in this procedure forces B to be a vector. The process works exactly 
the same way if B is a matrix and thus 884 is defined for matrices B as well 

The monadic function Ga sets 8-10 11a and then calls the dyadic function & 
to compute Bia. 


BACK-SUBSTITUTION 


Our treatment of linear equations is so far somewhat incomplete. We 
‘ods of reducing a system of linear equations to echelon form or, 


eth- 
ow, Upper 


384 Orthogonality 


. HHLDR) that 


echelon form and APL functions (ECHELON, GAUS. plement 
the procedures that have been written. a‘ 

The process of writing down the answer once the echelon form has been 
obtained, however, has not been described with sufficient formality to enable an 
APL function to be written (but see exercises 40 and 43 of Exercises 4.5). 

The function BACKSUB described below remedies this situation. The expres- 


sion 


AcS BACKSUB A.B 


solves the equation RY = B where R is upper echelon (in particular, if R is in 
echelon form). The scalar S sets the scale for comparisons to zero. Numbers are 
taken to be zero if they are negligible compared to S. 

The deseription of the function BACKSUB is somewhat technical and may be 
omitted without loss of continuity 

To begin, assume that the system AN = B is of the form 


Fdyats too + Magy = by 


+ dyyX 
+ yg Xy = Dg 


First, fb, # 0 for some / > r, then the system is inconsistent and need not be 
considered further, (The actual comparison, of course, is whether b, is negligible 
compared to S.) The result returned by BACKSUB in this case will be (W.0) 0, 
the empty set in R” 

Otherwise, if dy, #0, we tke x, 
equation to get a value for x 
considered in detail. 

‘Suppose first that we have moved up through the second equation, thus as- 
signing values to xa, X4..-.. ty. That is, assume that we have solved the system of 
equations in the box with the solid outline and to this solution we wish to add one 
more equation (the first) and one more variable (x,). Once we describe how this is 
done, we are well on the way to a formal description of a solution process that 
proceeds by induction on the number of variables — that is, the number of col- 
umns of 4 

If there is a unique solution, x, =, for é = 2, 3,..., n, to the smaller system 
and of a,, #0, then there is a unique solution to the original system 


b, + dp, and then move up to the next 
Ivis this “moving up” process that must be 


5.5 The Householder Algorithm 355 


Ma] [lr = aaaee — anaes — + 
x Ce 
= ¢ 


= Fann) = Ay 


x 
3 


3 


If there is a k-flat of solutions (k > 0) to the smaller system and if a,, 4 0, 
then there is a k-flat of solutions to the original system. Suppose that the solutions 
of the smaller system make up the k-flat defined by py, py. Py in RI! (we are 
labeling our axes x3, x5,...,x, in R"™? here), Suppose that these points ate stored 
as the columns of a matrix Z. Then the solutions of the original system form the 
k-flat in R" defined by 


(BUVI-(VALY 19 xZ-ALTE TD) OIE 


— that is, the above formula for the case k = 0 applied to all the columns of 2, 

If there is a k-flat of solutions of the smaller system and a, = 0, then a 
problem arises. If p = (pp, Pyr--. Pq) 18 @ solution to the smaller system and 
4,1 # 0, then we have the flexibility to choose a scalar .x, =p, so that the equa- 
tion 


GiPy tb AP2 t AsPs + 17* Hin Pn =O 


is satisfied. If a, =0, however, we have no room for adjustment, and it can 
happen that ayspz + ayy + <-> + dyyPy # by. This means that no first compo= 
nent can be added to p to get a solution of the original system. Thus, before using 
the first equation to define x,, we must make sure that we have the set of solutions 
of the system of equations in the larger box with the dotted outline. This compli- 
cates the inductive assumption slightly. If a,, # 0, we assume the solution of the 
(n — |)-dimensional system in the solid box; if a,, = 0, we assume the solution of 
the (n — 1)-dimensional system in the larger dotted box. 

In the case @,, = 0, suppose that the solutions of the larger system in the 
dotted box make up the k-flat defined by the points pp, py, Par--.P, in RY! 
Then the solutions of the original system define a (k + I)-flat in RM, since the 
echelon form has one more nonpivot column, This (k + 1)-flat can be defined by 


qo = (Pos fr = (0. Po)s 92 =P), Ga= (0. Pods 
seen har = (O.PQ) 


Thus if py. py»---.Pe are stored as columns of Z, then the points gp, gys-+=+4us1 
are the columns of 


356 Orthogonality 


or (1.21:11).0,{1}2. OF course if the (n — 1}-dimensional system is inconsis- 
tent, then the n-dimensional system is inconsistent as well. So, an (" — I)-by-0 
matrix Z must be replaced by an n-by-0 matrix 0. {1)2. 

The problem of assigning a value tox, — that is, of starting the induction — 
was passed over too hastily above, The last equations might easily look like this: 


yk, + OererXtey 8+ + Oey Xy = by 


In this case itis clearly wrong to start the process with x, = 6, + dy. We avoid the 
problem by starting the induction with n = 0 rather than ” 1 
Consider the equation 


AX =B 


where A is m by 0 (A-(w,0)90). We take X and B to be vectors. Then, since 
(oX)=14 9A, We must have X= 0, and hence (a> «x)ampo. This is because a 
linear combination of an empty set of vectors always results in a zero vector (cf. 
exervise 23 of Exercises 4.4), It follows that the equation has a solution if'and only 
if B is the zero vector. 

The set of solutions of AX = B is a k-flat in R", where n is the number of 
columns of A, Say that the k-flat is given by the k + | points py. py. -.Py- Our 
Answer is {0 be such a set of points stored as the columns of an n-by-(k + 1) 
matrix Z, In the present case 1 = 0. The space R" is a single point represented by 
10, So if B = 0, then Z is a O-by-1 matrix, since a single point is a O-flat. If B 4 0, 
then Z is 0 by 0, representing the empty set, (The empty set is sometimes said to 
be a —{-dimensional flat.) To summarize: If A has no columns, then Z is 0 10 if 
B do ono if B ZO. 

‘This establishes the starting values for the induction, but a detail remains to 
be explained. In working down the diagonal, it is possible to run out of rows 
before running out of columns. This brings us to the equation 


AX =B 


where 4 has columns but no rows, In this case any choice of X’ will do, since 
B =o and (a+ =x)=«0 for any X. Thus AX = B has in this case precisely the 
same solutions as ‘ 


0- 


0 


ee eee 


For this reason we treat this case in the same way as the case a,, = 0 discussed 
above. Of course we cannot use the expression 4{ 1.10 for a test, but we can use 
AL: 11). =0, since 4 is always in upper echelon form, or we could use the expres- 
sion 0-1 11a 


5 The Householder Algorithm 387 


In the function BACKSUB below we let A stand for the augmented matrix of 
the system, Thus B-a{ 14 pa) 


¥ 2S BACKSUB A 
T1} 2(0, SA =S4AL; 149A) 90 

[21] (t=14pa)v0 

[3} (50 =SrAt 1D /S 

(4) 2S BacksuB 1 14a 

(BP 22CCO MALY DCA SGALA D4 2 D-ALTIDD. IZ 
(61-0 

(71 4-2-0, [118 BacKsus 0 134 

[8] (0-142) /0 

19} ZC vazE44).2 


Line [1] initializes 2 10 0 190 if@-a{ 149) is negligible compared to Sand 
10 0 00 otherwise, If the solution is a flat in R®, then the function stops on line 
[2], Line {3] tests ifthe first column is negligible compared to S, IFit is, lines [7}. [8} 
and possibly [9] are executed and the function stops. Otherwise lines [4}, {5}, and 
[6] are executed and the function stops. The function cannot reach line [3] unless 
has at least two columns. It may be without rows on line [3}, however, in which 
se the function jumps to line {7}, Thus, line [5] cannot be reached unless A 
contains at least one row and (wo columns. The expression -11A(1, | 18 used in 
place of BL] and -1414A(1 51 is the vector (ayy. yyy +--+ iq): 

To solve the equation AX = B one can use 


(0) 14,8) BACKSUB ECHELON A.B 
or 
(0) (A.B) BACKSUB (% HSHLDA A)* *A.B 


Least-squares solutions are obtained with 


(0) .1A.B) BACKSUB (% ORTHO A) ¥A.B 


Exampue 5.23 Consider the system of equations 


Ragan 
+ Ny + % 
Bye ee he 


This is AN = B with 


388 Orthogonality 


The echelon form of the augmented matrix [4|B) is 


+E-ECHELON A.B 
1.000 0.000 2.000 4000 0 000 6 000 
0.000 1.000 3.000 5 000 0 000 7 000 
0.000 0.000 0.000 0.000 1.000 8 000 


‘Thus the set of solutions form a plane in R® with parametric representation 


x) fo] [-2 -4 [' | 
xy} {7} |—3 = les: 
xyJ=o]+] 1 0 
xy} JO OF a 
xl [8 o 0 


On the other hand BACKSUB produces points py. P,. Pz defining this plane. 


3 
VANS 1©$ BACKSUB E 
0.000 2.000 6 000 
1.000 2.000 7 000 
1000 0.000 0 000 
1000 1 000 0 000 
8000 8 000 8 000 


We can check the accuracy by looking at B — AX for each of these points, 


888.03 3B 
8BB-A+ =ANS 

00060 0 00E0 0 0060 

0.0060 0.0060 0 0060 

0.0060 0 0060 0 00£0 


(This extraordinary accuracy arises from the special form of A and B.) 
On the other hand we may use HSHLDR to reduce 4.B to upper echelon 
form. 


VAO(OMSHLOA AY 0A, 
£0 1.6260 -1-79E18 1.63E0 —4.08e-1 6.5260 
v7e v8 | 6.711 1.73602 aveo 5.77615 778-1 


B.67E19 1 30618 2.17E 18 3.04 18| 7.07E-1 5 66e0 
Applying BACKSUB to R produces 


+ANS2-S BACKSUB R 
1.7017 2.0060 6 00€0 
1006 2 00€0 7 00EO 
1.00£0 0.0060 0.000. 


55 The Householder Algorithm 389 


10060 1. 00£€0 (0.000 
8.0060 —-8.00E0 8.000 


This time there has been a minimal amount of round-off error (this system 
carries almost eighteen decimal digits). 


BBB-A+ xANS2 


0. 00£0 1996-17 1.396717 
1.21617 1.046717 6. 94-18 
0.000 0.0060 694618 8 


In spite of the outcome of this example one may in general expect better 
accuracy from HSHLDR than from ECHELON. 


EXERCIS! 


55 


1. Assume & negligible compared to 1 but enot negligible compared to I (ie. |e| > Gor 
bute < der). Let 


(a) Show that applying the Gram- 


duces: 
1 0 0 
ees0) spi i 
wale ~~") ana 
0 se 0 Ht 
0 0 « 


and the columns of U are nor orthonormal in m 
(b) Show that if O = MSHLDR A, then 


and the columns of @ and hence U = Q[:.3} are onhonormal in machine arithmetic 


360 Orthogonality 


In exercises 2 through 6 use the Gram-Schmidt process to construct a QR factorization of 
the matrix A (cf exercises 21 through 25 of Exercises 5.4) 
a) 
4 A=|1 01 
oii 


In exercises 7 through 14 an APL function is specified. Write an APL function meeting the 
specifications. 


7. Name PRICTR 
Right argument: A matrix A 
Result The perpendicular projection matrix /, where S iy the column 
space of A 
Remarks: Use ORTHO. Test your function on exercises 26 through 31 of 
Exercises 54. 
8, Name COMPS. 


Left argument; A vector v 
Right argument: A matrix 


Result A matrix Z with Z{;1] = ¢, Z[:2] = v% the components of v par- 
allel and perpendicular to the column space of A 
Remarks PRICTR from exercise 7 can be used. Test the function on exer- 
cises 26 through 31 of Exercises 5.4 
9. Name: RFLCTR 
Right argument; A matrix A 
Result The matrix Ry of reflection in the column space S of A 
Remarks: Use ORTHO ‘and the function PRICTR of exercise 7. Test the 
function on exercises 26 through 31 of Exercises 5.4. 
10, Name: SBSP 
Right argument: A set of points py. py. .--. py defining a flat in R® stored as the 
columns of a matrix, 
Result The generators v =p, — py of the subspace parallel to the flat, 
stored as the columns of a matrix 
1, Name: ORDIST 
Right argument: Points py. py «-- » py defining a flat, stored as the columns of a 
matrix 
Result The minimum distance from the fat to the origin 
Remarks Use the fesults of exercises 10 and 7 or 8. Test the function on 
exercises 34 through 49 of Exercises 5.4, 
12. Name PROJ 
Right argument: Points py. py. -.-. Py defining a Mat, stored as the columns of 
matrix 
Result: The affine function, /(X) = b + AN, of perpendicular projection 
onto the flat stored as an augmented matrix [b|] 
Remarks: Use the results of exercises 10 and 7. Test the function on exercises 


34 through 49 of Exercises 5. 


56 Inertia and Principal-Component Analysis 361 


13. Name: REFL 
Right argument: Points p,, p,,--.. p, defining a flat, stored as the columns of a 
matrix 
Result The affine function, /(X) = b + AX. of reflection in the flat stored 
as an augmented matrix [6] 4] 
Remarks: Use the results of exercises 10 and 9. Test the function on exercises 


34 through 49 of Exercises 5.4 


14. (Computer assignment) Use PERP and Proposition 5.21 to solve exercises 40 
through 47 of Exercises 5.4, 


(Computer assignment) In exercises 15 through 18 a linear system from Exercises 3.4 
is referred to. Solve the system by (a) using BACKSUB and ECHELON, (b) using 
BACKSUB and HSHLDR. If the system is inconsistent, find a least-squares solution using 
BACKSUB and ORTHO. 


15. Exercise |. 16, Exercise 3, 
17. Exercise 4, 18. Exercise 5. 
5.6* Inertia and Principal-Component Analysis 


(Rayleigh’s Principle) 


Suppose that we have a problem involving a large number of variables and we 
wish to simplify things by reducing the number of variables under consideration. 
We should do this in a way that minimizes the amount of information lost 

To be specific, suppose that two variables called xv and y are involved, Perhaps 
x represents weight and y height for a fixed population of individuals. Suppose 
that the heights and weights of n individuals are measured giving the data points 
P= CR Ta Nace 

If we reduce the number of variables by considering only x or by considering 
only y, then we are projecting our data points p, onto the x or y axis and working 
with the projections instead of the original points p,. This suggests the following 
procedure (Figure 5.21). Choose a new coordinate system x’y’ such that projection 
‘onto the x’ axis changes the points p, as little as possible. Then replace the wo 
variables x and y by the single variable x’. Before we can do this, we need to 


FIGURE 5.21 


362 Orthogonality 


define what we mean by “change the points p, as little as possible.” In practice this 
is taken to mean that the quantity 3}_, d? the sum of the squares of the perpen- 
dicular distances from the points p, to axis x’, is a minimum. 

Notice that this is not the same problem as that of fitting a least-squares 
straight line to the data (x,,.9,); / = 1, 2,. --. In the case of the least-squares 
straight line the sum of the squares of the vertical distances is minimized (see 
Example 5.24 below). 

The general case of reducing variables to k <n variables reduces to the 
following geometric problem. 


MINIMIZATION PROBLEM Let p,,-. - . fq be the points in R®, and let F be a k-flat 
in R", Let the affine function mp be a perpendicular projection onto F Find F so 
that the number 


Dil —zelP 


iy a minimum. 

This problem arises, for the case = 3, in classical mechanics. Suppose that 
the points are all particles with the same mass w. For k = | the flat Fis a line and 
the moment of inertia of the system of masses about this line is 


Sola, = zea? 
Thus the line sought is the one that gives the smallest moment of inertia. 


For arbitrary & and the flat F goes through the center of gravity 


Prorostrion 5,22. The k-flat F of the minimization problem contains the cen- 
troid (or mean) 


Proof Notice that the sum of the deviations about the centroid is zero: 


SH -PD= Ya-VP= Sa —mp ao 


= , = 1 et 


Now let F be the flat of the minimization problem, S the unique subspace parallel 
to F and g any point in F. Letg = q* + ¢ and p =p* + p' with respect to S. Let 
G@ be the k-flat through p parallel to § and F The perpendicular projection maps 
onto F and G are then (see Section 5.4) 


aX) =q" + PyX: 


X) =P + PX 


$6 Inertia and Principal-Component Analysis 363 


FIGURE 5.22 


where P, is the perpendicular projection matrix for S. Thus 


aX) = AX) +g —P 


(Figure 5.22). Thus 


= ma X) + (q =p) 


Dilla — Xr) 
a 
= > lla — tard — 9 = PP 
= WAP = QV Vly = Fal) +P = QT 
= mal PM? + AP, — tal PNP = 9)! + PY = 4°" 
= P+ 2 Sl — rap (7 = 9! + mil — 4g" 
& 
Now 
Dlr 
& 


POP = 4" 


=(P—g)* Sy —P) 4+ (B— Ws SP — zal Pd 


a a 


=0+0=0 


The first term above is zero, since © (p, — ') =0, and the second term is zero, 


364 Orthogonality 
Pi 


apn 


FIGURE 5.23 


since each of the terms p — 7(p;) is parallel to G and (7 — q)* is perpendicular 
to G. Thus 


Slee — re pal® = Si av = Fel pall? + mil — ae 


=! —that is, when F = . 


which is minimal when 7 


Since P lies in F; we have, by the Pythagorean theorem (Figure 5.23), 


Sila -Ail Seka) — Pit + lle, — eA POI) 


= Bi liev(p) — Pl? + Sil — =A Pale 


Now the quantity ¥ |jp, — pif? is a constant. It follows that ¥ |p, 
minimum precisely when ¥ jjz(p,) — pill? is a maximum. We 
minimization problem into a 


Apoieisa 
ave turned the 


MAXIMIZATION PROBLEM. Let PP? 
centroid of py. py 
perpendicular proye 


+ Pm be points in R" (7 need not be the 
Let F be a flat containing p and let cp be a 
ind F so that the number 


DS lrded ~ Al? 


is a maximum. 


To discuss the maximum problem, change to new coordinates by translating 
the origin to p. This simplifies the formulas. The flat Fis now a subspace, call it 
Sas in the proof of Proposition 5.22, and zp is multiplication by P,, the perpendic- 
ular projection matrix for S 

We will now derive a formula for 


> lee) — Ale 


5.6 Inertia and Principal-Component Analysis 368 


in terms of the coordinates of the points p, in the new coordinate system. Let 1, 
Us... ~ ty be an orthonormal basis of S and let U = [u,|us| ...|1,} Then Py 
UUF and, since p = 0, 


D> lee) — Pit = S WPspil® 


Now for each ¢ we have 


{Pspill? = (UTP "(UUTp,) = pTU(UTU)UTp, = p[UUTp, 
= (UTP UTp,) = (UT py? + (ubpy? + ovee + (uEpy® 
Now 
(up) = (Ul p Mud p,) = (ul pM pT) = uF (py PP, 
and so 
Slag? = Spy = a(S pipt a 
a n 7 


Now let A =[p,! py] ---| Py} Then 


rah 
nf 
are|?h 
and 
AAT = S' prt 


The next proposition summarizes this calculation. Ifthe flat F contains p und 
Sis the unique subspace parallel to F, then zp(p,) —p = Pp, — Pyp (Figure 
$.22 with G replacing F), 


Proposition 5.23 Let the points 7, py. p 
subspace of R" with orthonormal bases 1, 
ular projection matrix for S, Let A =[, 


++ Pm in R" be given. Let S bi 
= vl, and let P, be the perpendic- 
Pls = Ph \Pm — Ph Then 


‘ 
SupAdtyw 


Sede — Pe 


The matrix 447 of Proposition 5.23 is symmetric. Thus the next proposition 
solves the maximization problem. 


Propostrion 5.24 Let A be an 1 
Ay Where Ay > Ap > Ay > + 


-n symmetric matrix with eigenvalues \,, As. 
Nye If ty. Woy = = Hy iS an orthonormal set 


366 Orthogonality 


of vectors, define 


Slity uy) = >i uf Au, 


Then 
Nyda toe HAG SS Uitye Mare a) Sy Ana Ho tA ded 


Further 
Plldys tay ly) = Dy Ag He BA 


when u, is 4 unit eigenvector for A, 1 <i < ky 
Sly thay += ty) = Aq Agen +22 Agena 


when u, is a unit eigenvector for yy. 1S 1S A. 


Proof Since A is symmetric, it can be diagonalized by an orthonormal change of 
coordinates, Assume that the coordinate system has been chosen so that 


We will show that /(uy.u 
Pity ty 
Let U = [uy lug]. 


k 


Now, since A is diagonal, 


feu 


x 
2) = SA, 
7 


k 
= WA, + WTA + => + uA, — SA, 
a 


” . . ‘ 
= SAU IPE + SUF + =~ + SAUL AI? yA 
a fe a a 


k 
= DAUM YA, 


kon 
7 


= 


5.6 Inertia and Principal-Component Analysis 367 


> Saute ip= A 
da (> oust) + Se) (S uit) - SA, 


iZier NE 1 


k * 
SAU = +S) Aten 


& ere 
Sava = poten) + SA Ute 
res] er 


Now fori > k we haved, >), and hence —\, < —),, It can be shown (exercise 
6) that 1 — Ufi:}j is nonnegative and hence 


(HAM = JULIE < (AM = ULI, Sk 


Further, since A, >A, for #> k, we have 


AMULEN? SAUL 


vet) = SEN, 


A ‘ 
<(-A) S01 = HUAI +A, SS PULEIIF 


S wath 
i 3 

=d(-e+ Sewn + Swe 

=A(-k +k) 

0 


Similarly (exercise 7) one may show that 


‘ 
PMtlyy 25g) — SMy eas 20 


a 


Notice that we get 


& 
Phils thar <2) = Sy 
& 


368 Orthogonality 


if we take U = (n, k)r/D n and 


Ls 
TRC Para a 


= 


eke 


when U = -b(n, —k)r/D n. By our choice of coordinate system the columns of 
ID n are unit eigenvectors for A. 


In the case k = | Proposition 5.24 is known as Rayleigh’s principle. 
Proposition 5.25 (Rayleigh’s principle) Let A be a symmetric matrix with max- 


imum eigenvalue \, and minimum eigenvalue \,. Let X be any vector in R" and 
let /(X) be the Rayleigh quotient: 


fix 


Then Ay > /(X) 


SOLUTION TO THE MINIMIZATION PROBLEM 
Let 

ne 
=e 


= 


and define the matrix 4 by Alsi] =p, — . The flat F that solves the minimization 
problem is defined by the points p. 7 + «,... .P + uy here u, is an eigenvector 
belonging to the Ath largest eigenvalue of AAT, 


Derintrion 5.14 ‘The solution to the minimization problem is called the principal 
k-flal. The eigenvectors uy, ..., uy above are the first k principal axes. 


Now we will write an APL function to compute the principal k-flats of a set of 
Points pps... «py, in R®, We will use the function JACOBI from Section 5.2 to 
diagonalize AAT. The output of JACOBI, however, does not give the eigenvalues 
of AA? sorted in descending order. To sort the eigenvalues we use the APL func- 
tion downgrade, denoted by +, If v isa vector, then tv is the set of indices of 
v—that is, 1 o¥, arranged So that vitv) is in descending order. 


ves 17-264 -81 


Ww 
3512746 
vityy 
7 4 34 4 =e =B 


5.6 Inertia and Prineipal-Component Analysis 369 


For computational purposes we start with a matrix P =[p,|ps| +++ |p] of 
data points. The centroid can then be computed as 


P - AVE oP 


where AVE, defined in Section 3.1, computes the averages of the columns 
data matrix. Alternately, one can use the expression p+ -m. This takes care ot 
Next we need a function to compute the principal axes from the data matrix P. 


Vv Z-PRAXES P 
U1] PAPA (Pe <11P)# +CHP)L2190 
(2) Z-JacoB! pup .xaP 
(31 Z-20;41 19¢az+.xPr xz) 

. 


On line [1] the centroid is computed as Ps .m and P is replaced by 
[Ps — Pips — Pls=-1Pm — Pl-On line two the principal axes of PP? are computed 
and stored as columns of Z. On line three the columns of Z are arranged in 
descending order according to the size of the corresponding eigenvalues. 


Exampe 5.24 Consider the isosceles triangle with vertices at (—a, 0), (0, 1), and 
(a,0) in the xy plane (Figure 5.24), Let us calculate the principal line (1-flat) for 


the vertices. We have 
Saray Ssnyo 
r=[% 1 ob 7=3{i] 


=3a 0 3a a 0 
z a 
a= en, 2, ait 4 0 il 


Hence 


Since AAT is diagonal, the principal axes are (1,0) and (0, 1). If a? >§, then 


a a 


FIGURE 5.24 


310 Orthogonality 


uu, = (1,0) and the principal line is the horizontal line through p = (0.4). If 
a <3, then w, =(0, 1) and the principal line is the y axis. If a* = j, then the 
principal line is undefined, In this case any line through p yields a minimal value 
of Sd. 

The case a? = } occurs when the triangle is equilateral. Thus the principal 
line is horizontal when the third side of the isosceles triangle is longer than the 
equal sides, and the principal line is vertical when the third side is the shorter. 

Now let us compute the least-squares straight line fitting these points that 
minimizes the sum of the squares of the vertical distances. The methods of Section 
2.3 (see Example 2.28) give 


eel ee (jeer 
-a 0 aj{! ofx=l-a 0 allt 


loa 0) 


or 


iL] 


Thus the least-squares straight line is y = 
independent ofa. 


and, in contrast to the principal line, is 


The next example is from physics 


EXAMPLE 5,25 Consider the set of points in Figure 5.25. There are ten points 
spaced uniformly along the x axis from —3 to —3 and eight points spaced uni- 
formly around the circle with radius 1 and center at 1 on the axis. 

We will refer to this configuration as a “tennis racket.” 

‘The tennis racket lies in R®. The = axis is perpendicular to the page. The 
coordinates of the eighteen points are stored as the columns of the matrix P. 


FIGURE 5.25 


5.6 Inertia and Principal-Component Analysis 371 


oP 
30060 0.000 0 00Fo 
2. 70E0 0.000 0 00£0 
-2 4060 © o0£0 0. 00£0 
2.1060 0 00E0 0 o0£0 
1 B0E0 0 000 0.000 
1.50E0 0.000 0 000 
12060 9 000 0. 00c0 
9 00€-1 0 00E0 0.000 
6.001 0.000 0.000 
3.00E1 0. 00E0 0.900 
2 000 0 00E0 0.000 
1 7160 7.07E1 0. 00£0 
1 0060 1 000 0 000 
29361 7.0761 0.0060 
0.000 103E'18 0. 00E0 
2.936) -7.07E1 0. 00£0 
1.00£0 1000 0 0060 
17160 7076-1 0. 00€0 


The center of gravity of the tennis racket is on the handle near the strings. 


+PLAVEDP 
0.472 0 0 


The direction of the principal axes, in descending order, are 


PRAXES P 
10 0 
o1 0 
oo 4 


The first principal axis is along the handle. The second principal axis is per- 
pendicular to the handle and in the plane of the racket. The third principal axis is 
through the centroid and perpendicular to the plane of the racket 

Recall that the kinetic energy of a mass m moving with velocity v is }me*, In 
theoretical mechanics it is shown that the energy of « mass rotating about a line / 
with angular velocity w is {/,w*, where /, is the moment of inertia about the line /, 
If we hold w constant and let / vary, then the energy is a minimum when /is the 
first principal axis. It follows from Rayleigh’s principle that the energy is a maxi- 
mum if the racket is spinning about the third principal axis (exercise 8). 

If a tennis racket is tossed into the air, its motion may be analyzed as follows 
The centroid moves along some path, At any given moment the racket is rotating 
about a line / through the centroid. The direction of the line / through the centromd 
changes in relation to the racket with time. Neglecting ait re the energy of 
rotation about / will be constant while the racket is in the air, 


anes 


372 Orthogonality 


If the racket is launched with / parallel to the third principal axis, the direc 
tion of / will remain fixed; it cannot change without the system losing rotational 
energy. Thus the racket flies with a steady rotation about the third principal axis. 
“This is easy to check using a real racket, Similarly, if the racket is launched with 
along the first principal axis, then the direction of / cannot change unless the 
racket gains energy. This is harder to check with a real racket, since the shape 
makes it hard to launch a racket rotating only about this axis. These two rota- 
tional modes are stable. Rotation about the second principal axis, however, is 
unstable, The direction of the line / may change without changing the rotational 
energy of the system, It is essentially impossible to make a tennis racket rotate 
steadily about the second principal axis, 


PRINCIPAL-COMPONENT ANALYSIS 


Principal-component analysis is a statistical technique for reducing the number of 
variables in a problem. Let us return to the height and weight example that began 
this section. A sample of m individuals is chosen from a population and their 
heights and weights are measured, giving m points p, = (x,..¥;) — a two-variable 
problem. One then chooses x'y’ coordinates as follows. The origin O' is the cen- 
oid p, the x’ axis is along the first principal axis, and the »” axis is perpendicular. 
The y’ coordinate is discarded and one works with the x’ coordinate. 

In this way we lose the minimal amount of information, But how much infor- 
mation is lost? Let F be the principal k-flat and let <p be the perpendicular projec- 
tion function. If p is the centroid, then by the Pythagorean theorem we have 


S lai — All? = S Weeks) — Pie + > Wl, — Ap? 


and Fis chosen to minimize the second term on the right-hand side. If we divide 
this equation by m (m — I for a sample), we have a statement about variances: 
Upon division by the left-hand side the m orm — 1 in the denominator will cancel 
and we obtain 


Dede) — Al? 
= = fraction of variance retained 


Sila -air 
DS eer a 
- —— = fraction of variance lost 
Sila — Air 


The proof of the next proposition is left as exercise 9. 


Proposition 5.26 Let p,.-. . py, in R" be given. Let Fbe the principal k-flat for 
PrrPar-+ + +P: Let be the centroid and set A = [py — Pps — Pl -.-\Pm — PLE 


5.6 Inertia and Principal-Component Analysis 373 


the eigenvalues of AAT are Ay > Ay > +++, then 


> lize) — PIP = Ay + Ag + HA 


Sin —eAP Ser eto ty 
fem 
EXERCISES 5,6" 


1. (a) How does the principal tine of the four points (a, 1), (a —1), (a, 1), and 
(a, =1) vary with a >0? 


{b) Show that 2p, — pl? =4 if py = (Ds py = Ue —De py = (—b Dey = 
(1, =1) and the flat F is the subspace with orthonormal basis ((cos 4, sin 0) 
2. Find the principal I-flat for the set of points (0, 4), (1, 1), (3,3), and (4,0). Use Propo- 


sition 4.10 of 4.11 to diagonalize AAT. 

3. The set of sixteen points in R® given in the accompanying table ean be thought of as 
a discrete approximation to a book. Find the stable and unstable axes of rotation of 
this “book.” Test your conclusions with a real book 


Note: This is nor a computer exercise. 


Hint: Tape the book shut before you throw it in the air. 
4. Distance and perpendicularity are Euclidean concepts, This means that the principal 
A-flat of a set of points 1s Euclidean invariant of the set of poi ny congruence 
that carries set S, t0 set S, also carries the principal A-flat for S, to the principal k=flat for 
Sy. In particular, if the set Sis mapped into itself by a rotation or reflection, then the 
principal A-flat will be mapped into itself under the same rotation or reflection. Use this 
fact to find the principal 2-flat for the “book” of exercise 3 without computation. 


$. (Computer assignment) Approximate a boomerang by placing particles at the points, 

(1.0.0) fori = =7, ~6, ~5, —4, —3, —2, 1 and at the points (i, 1,0) for / = 0. 1, 2. 3, 

4, 5. Find the stable and unstable axes of rotation for the boomerang. 

6 (a) Show that if Q is a square orthogonal matrix, then QT is an orthogonal matrix. 
(>) From (a) show that if Q ism by and the columns of Q form an orthonormal set 
of vectors, then the rows of @ also form an orthonormal set of vectors 


(c) Asin the proof of Proposition 5.24, let the columns of U be orthonormal. Let § 
be the column space of U and let S* be the orthogonal complement of S, Let the 
columns of ¥ form an orthonormal basis of S-. By applying part (b) 10 the matrix 
@ =[U|V] show that |U/{F]}| < 1 for all valid row indices i 


374 Orthogonality 
7. Finish the proof of Proposition 5.24 by showing that 


Phy. ttay + ote) > SE Anaess 


8. Verify the statement made in Example 5.25 that the energy of the tennis racket is 
‘maximal when it is spinning about the third principal axis. 
Hint: Assuming that F contains 7, show that = |p, —=p(p)? is maximum when 
¥ |e 7) — ple is a minimum. To do this, imitate the argument in the text leading 
from the minimization problem to the maximization problem 


9. Prove Proposition 5.26 
Hint: Let u, be a unit eigenvector belonging to.,, 1 and change coordinates 


by the formula X= f+ (uy || oo (img)X™ 


CHAPTER SIX 


Linear Programming 


Linear programming is a technique widely used in industry as a way of helping 
management make decisions. It is also used in large numerical economic models. 
Linear programming is an optimization technique. It is concerned with finding 
maxima and minima of linear functions f: R" —» R, Of course such a function /(X) 
does not usually have maxima or minima if ¥’is allowed to range over all of B®. In 
linear programming problems /(X) has a maximum or a minimum because X is 
not allowed to vary freely but is constrained to be in a specified region of R". The 
constraints imposed are linear inequalities, and this is what makes the subject part 
of linear algebra, 

Section 6.1 is devoted to exhibiting examples of linear programming prob- 
Jems and the technique used to put such problems into standard forms, 

In Section 6.2 the geometry of linear programming problems is discussed. The 
constraints of a linear programming problem restrict the domain of the function 
/(X) to the region enclosed by a polyhedron in R®, and we gain insight into the 
problem by looking at the hyperplane /(X) = const. near a vertex of this polyhe- 
dron, 

Section 6.3 describes the simplex algorithm for solving linear programming 
problems. This is the algorithm upon which many large computational packages 
for solving linear programming problems are based. 

In Section 6.4 we offer some theoretical as opposed to computational applica~ 
tions of linear programming and the closely related topic of game theory. 


6.1 Examples of Linear Programming Problems 


In this section we concentrate on phrasing problems in linear programming form. 
Solution methods are deferred to later sections. 
We begin with a typical problem involving the allocation of limited resources. 


EXAMPLE 6.1 A farmer grows (wo crops, say cor and soybeans. Suppose thitt 
each acre planted with corn requires | man-day of labor, $3 investment (seed and 


as 


376 Linear Programming 


s0 on), and yields a profit of $35 at harvest. Each acre of soybeans, on the other 
hand, requires 2 man-days of labor, $1 investment, and returns a profit of $20 at 
harvest. The farmer has a total of 70 acres available for these two crops. He has 
$140 he can invest and 80 man-days of labor available. How many acres of each 
erop should he plant for maximum profit? 


Setup Let x, be the number of acres of corn planted and x, the number of acres 
of soybeans. Then 


profit 


35x, + 20x 
This is the function we wish to maximize. This function is unbounded in the whole 
x,» plane, but the possible values of x, and x, are quite restricted. 
Since the farmer cannot plant a negative amount of land, we must have 
x, >Oand x4 > 0, 
Further, since only 70 acres are available, 
Xy +x, < 70 
‘The limit on available labor imposes the constraint 
x + 2x, < 80 
and the limited amount of capital available forces 
3x, + xy < 140 
The problem can be summarized as 


maximize 
subject to 


We will see in Section 6.3 that the maximum profit is obtained with 40 acres 
of corn and 20 acres of soybeans. This uses all the labor and all the capital and 
leaves 10 acres unplanted for a profit of $1800, 


The system of inequalities can be rewritten in matrix form. If v and w are 
vectors, we will write 0 <w if ufé] < w{i] for every 1. Then, setting 


I 70 
2), b= so}. «= [9 
1 


' 
| 
3 


6.1 Linear Programming Problems 377 


the system of example 6.1 becomes 


maximize z= CPN 
subject to AX <b (6.1) 
x>0 


Derintion 6.1. The system (6.1) is called a standard maximum problem. The 
function = is called the objective function and the condition X > 0 is called the 
positivity condition. 

Many authors refer to a standard maximum problem as a primal problem, 


EXxampte 6.2. A nutritionist is preparing a supplement mixture to add to a com- 
mercial preparation of creamed chipped beef on toast. The nutritionist has three 
mixtures available that can be combined to give the desired mixture. Each gram 
Of the first mixture contains 1 unit of calcium, 1 unit of iron, and costs 70 cents. 
Each gram of the second mixture contains | unit of calcium, 2 units of iron, and 
costs 80 cents. Each gram of the third mixure contains 3 units of calcium, | unit of 
iron, and costs $1.40. Each batch of the product needs 35 units of calcium and 20 
units of iron, How much of each mixture should be added to each batch of the 
product to meet these requirements at minimum cost? 


Set up Let y, be the amount of the ith mix to be added. Then the cost of the 
additives is, in cents per batch, 


Ww = 70y, + 80) + 140) 


Again we have the restrictions y, > 0. Further, since 35 units of calcium are 
needed, We must have 


Vy typ + By > 35 
and the requirement of 20 units of iron per batch forces 
Vy + By +3y > 20 


The mathematics problem can be summarized as 


70y, + 80), + 140), 
+ Yt 3,235 
y+ 22+ Yy > 20 
M20 9220, yy >0 


minimize 


subject to 


In Section 6.3 we will see that the minimum cost of $18 is attained by adding 
5 grams of the second mixture and 10 grams of the third mixture to give precisely 
35 units of calcium and 20 units of iron per batch, 


378 Linear Programming 


Taking 


ff 


We may rephrase the problem of Example 6.2 as 


minimize w= bY 
subject to BY > (6.2) 
yoo 


Derinttion 62. The system (6.2) is called a standard minimum problem. The 
function w is called the objective function and the condition ¥ > 0 the positivity 
condition. 


We will see in Section 6.2 that there is a close relation between the problems 
‘of Examples 6.1 and 6.2 — indeed the solution of one yields the solution of the 
other. 


Derinrrion 6.3 If'in Equations (6.1) and (6.2) we have B = AT, then (6.2) is said 
to be the dual of (6.1) and (6.1) is said to be the dual of (6.2). 


We have not given, and will not give, a formal definition of linear program- 
ming except to say that a linear programming problem is one that can be reduced 
{o either a standard maximum problem or a standard minimum problem, In 
Section 6.3 we will write an APL function MAN to simultaneously solve a dual 
of standard problems. Most problems do not automatically come in standard 
form, however, and a variety of techniques are used to rephrase problems in 
standard form 


EXaMPLe 6.3. A manufacturer has two plants and three distribution points (ware- 
houses). To meet the local demand for the product, the first distribution point 
requires 3 carloads of product per week. The second requires 2 carloads per week, 
and the third requires 3 carloads per week. The first plant can produce 4 carloads 
per week and the second plant 5 carloads per week. The shipping charges, in 
hundreds of dollars per carload, are indicated in Figure 6.1. 

Determine a shipping schedule that minimizes the shipping cost subject to the 
restriction that each distribution point receives the required amount. 


61 Linear Programming Problems 379 


Dy 
3 
1 1 
- % 3 
i : D Py 
4 : 2 = 5 
3 x 
Ps q 
Ds 
3 


FIGURE6.1 


Ser up Let x, be the number of carloads shipped along the route indicated in 
Figure 6.1. Then the cost of shipping is 


We wish to minimize w subject to the following restrictions; P, cannot ship more 
than 4 carloads 


M$ +4 <4 


not ship more than 5 carloads. 
Ay HX HAG SS 
D, needs at least 3 carloads 
Xx e3 
D, needs at least 2 carloads. 
xy +x, 22 


D, needs at least 3 carloads. 


Xstxq 23 


380 Linear Programming 


In summary, 
minimize y+ Xp $ 3x5 + 2a, + 3s + 5% 
subject to Rp) Gets + % <4 


Xy.Xy 


This system is a minimum problem but is not in standard form. We easily 
remedy this by multiplying the first (wo inequalities by ~1 to obtain 


The system can also be rewritten direetly as a standard maximum problem. 
We need only notice that w is a minimum precisely when = = —Ww is a maximum, 
Thus we have the standard maximum problem 


maximize z = —xy —Xy— 3xy — 2x, — 3x5 — Sq 


subject to sage 


Using the methods of Section 6.3, we can show that the solution is ¥ = 
(1, 2,0, 2, 3,0), which has the second plant producing | carload per week under 
capacity, 


Example 6.3 illustrates two techniques for rephrasing problems. Inequalities 
may be reversed by multiplying through by « negative one. The problem of mini- 
mizing = is the same as that of maximizing —z. and the problem of maximizing = 
is the same as that of minimizing —=. 


Exampte 6.4 A furniture factory produces three types of couches. The first type 
uses | board foot of framing wood and 3 board feet of cabinet wood, the second 
requires 2 board feet of framing wood and 2 board feet of cabinet wood, and the 
third uses 2 board feet of framing wood and | board foot of cabinet wood. Cur- 
rently the factory is producing each month 500 couches of the first type, 300 of the 
second type, and 200 of the third type. 


6.1 Linear Programming Problems 381 


The supplier informs the factory management that there is a shortage of 
cabinet wood and the supply to the factory will have to be reduced by 600 board 
feet per month. In partial compensation the supply of framing wood may be 
increased by 100 board feet per month. If the profit on the three types of couches 
is $10, $8, and $5, respectively, how should the production of each type of couch 
be adjusted to minimize the decrease, if any, in profits? 


Set up Let x, be the change in the number of couches of type i produced each 
month — positive for an increase in production, negative for a cutback. 
The change in profits is then 


4, 


10x, + 8xy + Sky 


Since more framing wood will be ava 
wood used will be 


jable. the change in the amount of framing 


<= 100 


y+ 2xy 4 2x 
whereas the change in the amount of cabinet wood used is 
3xy + Xe +. < — 600 


Since the number of each type of couch produced cannot be less than zero, we 
have 


Minimizing the loss means maximizing Ap. Thus the mathematical problem be- 
comes 


maximize 


10x, + 8x5 + Sy 
subject to xy + 2xy + Ixy < 100 
Bx, +2, + 44 < —600 
x, 2 —300, x, > —300, x, > —200 


This problem is not in the form of a standard maximum problem, since we do not 
have the positivity condition. The problem must be recast, since the methods of 
Section 6.3 assume x, > 0. 

‘Two techniques are available for rewriting the problem as a stindard maxi- 
mum problem. 

The first technique depends on the fact that although the x,’s may be nega- 
tive, they are bounded below. If we translate coordinates by the formula 


x 500 > 
is x" + | 300 
‘ 


—200 


382 Linear Programming 


then the condition 


=500 


—200. 


becomes’ > 0, To make the coordinate change we write the system in matrix 


form: 
maximize =X 
subjectto AX <6 (6.3) 
Nod 
where 
lo 4 500 
c=| 8]. A 5 —300 
5. = —200, 


and then substitute ¥ =X" + dto get 


2 = CPN = MN! +d) = CIN" + eld 
or 
TX’ — 8400 
AX = A(X! 4d) = AN’ + Ad <b 


AX’ Sb — Ad = | | - ese 


or 


— 600, —2300. 
L600) 
eal 2 
and 
X=X'+d5d 
or 
Yr>o 


Since additive constants change the value of maxima but nor their location, we 
have the standard maximum problem 


maximize yactX’ 
subject to AX’ <b 
x’>0 


The second method of obtaining a standard maximum problem when x, may 


6.1 Linear Programming Problems 383 


take on negative values is to split x, into a positive part and a negative part: 


$20, x7 20 


Then, for example, if x, = —1, we can take a7 =0, x7 = 1 > 0, or perhaps 
x7 = 103 > 0, x; = 104 >0. 

We need not know a lower bound on x, to use this technique. 

In the present example we replace X’ by 


x 


hore EXER) xTSO 


in Equation (6.3) to obtain the standard maximum problem 
7 x* 
maximize z=1r-e[y | 
A = in b 
bject to 4 x 
subject t ie lt le (6) 
ns 


y = 


The system (6.3), incidentally, has an infinity of solutions given by 


—350) 2 
Xn) = | 225} +4]-S],  -S0 <1 < 105 
0 4 
For this line segment 2(1) = e7X(1) = —1700 independent of ¢, Thus the profits 


decrease by $1700 per month (from $8400 to $7700), and this is the smallest 
possible decrease, given the constraints, 


The infinity of solutions in the last example indicates that more constraints 
may be imposed without decreasing profits further, 


ExaMPLe 6.5 We continue with the problem of Example 6.4. Since the cutback in 
supplies is short term, the management of the factory wishes to hold the work 
force constant during the period. If the first type of couch needs $ man-hours of 
labor, the second type 7. and the third type 5, how should the production schedule 
be changed? 


Set up All the constraints of Example 6.5 are in force, In addition, we have that 
the change in labor requirements should be zero: 


5x, + 7xy + 5x, =0 


384° Linear Programming 


Thus we have the system 


maximize 
subject to 
(6.5) 


which we wish to rewrite as a standard maximum problem. In Example 6.4 we 
saw how to take care of the fact that the x;’s may be negative, so the problem is 
the equality in the first constraint. 

Of course, the equality constraint implicitly reduces the number of variables 
by confining the problem to a hyperplane. We can make this explicit by solving 
for xy, say, in terms of x, and x, and then substituting for x, in the remaining 
inequalities, obtaining a system in only Ovo unknowns. In a large system, how- 
ever, this procedure can involve a fair amount of computation. The common 
procedure is to increase the number of constraints by one by replacing the equal- 
ity with a pair of inequalities 


Sx, + Txy + Sx, <0 
5x, + 7xy + 5xy 20 


or 
Sx, + Ty + Sey 50 
<5x; — Txg = 5%y 50 

The addition of the equality constraint makes the solution unique: x, = 420, 


Xz = 400, xy = —140, The first type is cut back by 420 units, the second type by 
140 units, and the third type increased by 400 units (per month). 


NONLINEAR OBJECTIVE FUNCTIONS 


The scope of linear programming problems is much wider than the examples so 
far suggest. Consider again, for example, the problem of fitting a curve to meas- 
ured data (Figure 6.2), 

The least-squares straight line is the line that minimizes the sum of the 
squares of the distances 


Sai 


d=bi 


= + bx,)) 


A more intuitive procedure would be to simply minimize the sum of the distances 


4 


6.1 Linear Programming Problems 385 


FIGURE 6.2 


This method produces a straight line less influenced by outlying points than 
the least-squares straight line, 

The minimization of w = ¥ d, can be phrased as a linear programming prob- 
lem. We first prove a preliminary result 

In Example 6.4 we broke variables into positive and negative parts: 


xext—=x, x°3>0 x20 
There are many choices for x* and x”, however. If x = —2, then an obvious 
choice is x* = 0.x" = 2—butx* = 1, x = 3, for example, will do as well, The 


point of the next proposition is that when the proper kind of minimization condi- 


tion is imposed, then x* and x~ are determined by the equations 


rR ifx>0 0 ifx>0 ee 
=o otherwise * ~ L—x otherwise : 
Notice that when Equations (6.6) hold we have 

x oe (GHEE RS (67) 


Propostrion 6.1 Let S be a subset of R®. Let C, and C_ be vectors in R" with 
(C, + C_ld] > 0 for some vector of indices J, Suppose that Vg > 0, Xq > Oare 
vectors in R" with X, = Xj — Xq in S. Suppose further that 


m = CTX} + CTXG =min(CIN' + CTX),  X*- Xin S 


Then 0 = (Xj. Ng)[/] and hence the components of XJ}, Xl; Xgl] satisty 
Equations (6.6) and (6.7). 


Proof Recall that viw is the component-by-component minimum of v and w. Set 
B=NXjUNG. YT =XG —8, Y> =Ny —4. Then 
Yt = ¥- =(Xj —8) (5-38) 


Xj-Ap=Xy lies in 


386 Linear Programming 


Thus 
m<CtY* + CTY 
SCIN$ + CIXG —(C, FO) 
or 
mem—(C, +0) 
hence 


0< ~(C, + C5 


Since (C, + Cj] >0. this implies 6[/] <0. But 8 >0 as well so 
sU)=0. 


Now let us return to the curve-fitting problem. Changing notation a bit, let us 
take d, = y; — (a + x,) (d, now can take on negative values). We wish to mini- 
mize 3 {d,|. Consider the linear programming problem in the unknowns a, b, d, 


minimize w= Sd + dj) 


subjectto a +xb +d? —dy =y, 1<isn, 
dj >0, dp >0 


We may apply Proposition 6.1 as follows. The set Sis the set of all veetors (a, b, 
dfdyve.-od, dq) in R&"" with d'dy > 0. The vectors C, and C_are both (0, 


Oot tees Wand J = 2, 2(n + 1)). It follows from Proposition 6.1 that if 
(a, B, 8}. 8, 63. 5,) is a solution of the linear programming problem, then 
setting 3, we have 


Wy = > 8 + 8) = S18) 


Thus if w, = min ¥ |¢,|, we must have w, < wy. On the other hand, given any 
particular value w = ¥ {d;|, we can set 3 = |djl, 5 =O and get w = S (87 + 85). 
Hence wy <w;, and so the linear programming problem and the problem of 
minimizing © |d\| have the same solution, 

This argument immediately generalizes. First, however. a bit of notation. 


Derinirion 6.4 
1. The /, norm on R is the function 


Wel: R8 > R 


given by lol, 


6.1 Linear Programming Problems 387 


2. The /, norm on R® is the function 


R°—R 


given by |u|], = 
3. The /, norm on R" is the function 


given by jjo|], = max ({v|). 


The J, norm is the norm introduced in Chapter 5, and the least-squares 
solutions of 


AX =B 


are the solutions that minimize | — AX||y. In the discussion aboye we wished to 
minimize 


Dll = So by — @ + 6x,)) = 1B - AN, 


where 


a 
b 
Tas 


The argument given above can be generalized to prove the next proposition, 


Proposition 6.2 Let A be a matrix and B a vector, The vectors X that minimize 
the /, norm of the residuals 


iB — AX, 
are the vectors X that appear in the solutions of the linear programming problem 
minimize w= S\(D*{i) + Di) 


subject to Ax+ D'— D> =B 


EXAMPLE 6.6 Suppose the three data points (—1. 0), (0, 1), (1,0) are given. The 


388 Linear Programming 


arrays A, B, X of Proposition 6.2 are 


1-1 0 


a 
As x=[] 
1 1 0 


The vectors D must be the same size as B — three components. The scalar equa- 
tion version of the system of Proposition 6.2 is thus 


minimize w= dj +d; +d3 +dy +d5 +5 
subject to a —b +dj —dy 0 
a +dt =! 
a+b +dj—d;=0 


dj.dy.ds, 


dj.dj >0 


To put this system in standard maximum form we can replace w by z = —w, split 
4, b into positive and negative parts, and replace each equality by a pair of 
inequalities. The result is a system with ten unknowns, six constraints, and positiv- 
ity conditions that, in matrix form, could be written 


maximize =[0---O)Le-typxe 
x 
fe 
D 

A =A 1 =1x4) 7 8 

subject to eae at all <[3] 


D* 
D 


>0, Dv>0, D>0 0 


Proposition 6.2 shows that the minimization of |/B — AX), can be rephrased as a 
linear programming problem. The minimization of /B — AX|),, can also be 
treated as @ linear programming problem. The trick is to introduce a new variable 
‘constrained by 


t>|d), all’ 


where d; = Bl/] — (AX)[i}. Then the minimal possible value of ris the largest of 
the values |dj|, This leads to the next proposition, which we state without proof. 


Proposition 6.3. Let A be a matrix and B a vector. The vectors ¥ that minimize 
the /, norm of the residuals 


18 — AX], 


6.1. Linear Programming Problems 389 


are the vectors V that appear in the solutions of the linear programming problem 


minimize w 
subject to AX + D* — D 


1 
B 


Gi 
—(D* + D)>0 


EXAMPLE 6.7 Using the data points (—1,0), (0, 1), (1,0) of Example 6.6, the 
matrices A, B, X, D*, D~ are the same as for Example 6.6. Thus the scalar equa 
tion version of the system of Proposition 6.3 is 


minimize =r 
subject to a —b +d} —dj 

a +d} —d5 =i 

ath +d} —d5 

—di +d; +e 

=dt +5 #120 
—djtdg +120 
j,i. d$.d5,d3,dy > 0 


Notice that the /, and /, curve-fitting techniques will allow individual devia- 
tions |d,| to be large if the overall sum is minimal, The /, fit, on the other hand, 
concentrates on making the largest deviation jd\| as small as possible, even if | 
or ¥ |d,|* becomes quite large as a result. 

Adding linear constraints to an - or 
added problems. A problem such as 


ninimization problem poses no 


minimize = w= |B — AX), 
subject to EN <F 

can be handled by taking the linear programming problem of Proposition 6.2 and 
simply adding the constraints EX< Fa 


Exampte 68 A problem need not have an explicit objective function in order to 
be rephrased as a linear programming problem. Consider the question 


Does AX = B have solutions with V > 0? (6.8) 


This question can be answered by solving either of the problems 


#The problem may also be solved by introducing artificial variablex: see Section 6.3 


390 Linear Programming 


minimize w = ||B — AX, minimize w = |B — AX), 
or 
subject to ¥ >0 subject to X >0 


Ifthe solutions of these problems give w = 0. then the answer to the question 
(6.8) is yes, otherwise it is no. = 


The problem (6.8) is much more general than it appears. We will see in 
Section 6.2 that every linear programming problem may be recast in that form! 


GOAL PROGRAMMING, 


Problems using /, and /, objective functions fall into a general class of techniques 
known as “goal programming.” 

‘To see the origin of the term consider the set-up of Example 6.3, two plants 
shipping goods to three distribution points. The firm's management might reason- 
ably have two goals in mind: to keep the plants working at capacity and to give 
cach distribution point precisely what it requires. These goals are in conf 
Indeed, if both plants work at capacity, then 5 + 4 = 9 carloads are shipped per 
week, but if the distribution points receive exactly what they need, then 3 + 2 + 
3 =8 carloads are received each Week, and to achieve both these goals is not 
possible. 

IPour goals cannot be achieved simultaneously, then what strategy is best? In 
the above example the goals could be expressed as a matrix equality 


AN =B 


that has no solutions. If we cannot meet all our goals, perhaps we should minimize 
the total deviation from all the goals; that is, perhaps we should minimize 


w= |B AN, 


On the other hand, it might be reasonable, depending upon the situation, to 
ensure maximum progress toward all goals — that is, to minimize 


w = |B — AX, 


More realistically, some goals may be more important than others. A tech- 
nique used is to assign “penalties” of various weights to deviations from the 
various gouls. To do this, set d, = |B{/] — (AX)i]|, the deviation from the ith 
goal. We then attempt to minimize 


w= yd, + Cally + + + eydy 


where the ¢,’s are the weights. For example, if the first goal is twice as desirable as 


the second, we might have c, = 2c, 


6.1 Linear Programming Problems 391 


More generally, there is no reason to assume that falling short of a goal 
carries the same penalty as overshooting. We can weight the two kinds of devia- 
tions differently and try to minimize a function of the form 


ae 


HAT FCs 2)8 boos Hen 


Proposition 6.2 shows that such problems can be set up as linear programming 
problems. 

Such specific numerical goals (¢.g., ship 5 carloads per week from the second 
plant) may be mixed with more open-ended goals such as “minimize cost” or 
“maximize profit.” We content ourselves with a single example. 


Exampce 6.9 Consider again Figure 6.1. Suppose that oversupply at the first 
distribution point costs $15 per carload per week in added storage and under- 
supply costs $20 per week in lost revenue, The figures for the other two distribu- 
tion points are $10 and $15 for the second point and $12 and $13 for the third, Set 


dy =3— x,y 


= X56 


Let us assume that the given bounds on the capacities of the wo plants still hold 
and that there is no penalty for allowing a plant to work under capacity. Minimiz- 
ing the cost of the operation then becomes the mathematical problem 


minimize w= cTX +cTD? + cTD 
subject to A\X + DY — D> =B, 


A,X < By 
where 
Ky Xi Xs Noh (DN = let dd ay] 
13.23 5, (DY =(d; dy dj) 
10 12) P= [20 15 13] 
3h BE=|4 5] 
0000 1o1010 
1100 aire iaeraille 
ood 


EXERC 


In exercisgs | through 10 a linear programming problem is given. Rewrite the problem as 
a standard maximum problem or a standard minimum problem as indicated 


1 Standard maximum: 2. Standard minimum 
maximize = =2x +3) minimize x +3) 
subject 10. Sx — 6y >7 subject to Sv — 6y >7 

7x +8 <9 Iv +8) <9 


xy ed wyed 


392 Linear Programming 


3. Standard minimum: 4. Standard maximum: 
maximize 2 = 2x +39 minimize = =26 +39 
subject (0 Sx — 6 >7 subject to Sx —6y >7 

Ix + 8 <9 Tx +8y <9 
x20 x.y 20 

5. Standard maximum: 6 Standard minimum: 
maximize = minimize = = 2x + 3y 
subject to 53 subject to Sx ~ 6y =7 

Tx + 8y <9 Tx +8 <9 
xy 20 xy20 

7. Standard maximum: 8, Standard minimum. 
maximize z= 2x + 3y minimize z= 2x + 3y 
subject to Sx ~ 6y <7 subject to Sx — 6 >7 

Tx +8 <9 Ix +829 
x20 x20 

9, Standard maximum: 10. Standard minimum 
maximize 2 =2x + 3y minimize 2 = 2x + 3y 
subject to. Sx —6y =7 subject to Sx — 6y =7 

Tx + 8) =9 Ix + By =9 

11, Standard maximum 12. Standard maximum: 

minimize 2 = [2v +3) — 4] minimize w = [2x + 3y — 4) 


13. Standard maximum: 
minimize 2 = [2x + 3y — 4) + [2x + 3y ~ 5) 


14, Standard minimum 
minimize w= max (/2x ~ 3), 3x — 4)) 


15, Standard maximum: 16. Standard maximum: 
maximize z= 2x +3) maximize z= (2x + 3)| 
subject to —10<x < =I subject to. x > 4. > =3 

17, Standard maximum: 18 Standard maximum: 
minimize w = |4x + Sy} or ities 


minimize 


subject to «> =2, > = =3e ix <0 
subject to 2x + 3y <4 
19, Standard maximum 20. Standard maximum, 


-2v ifx<0 minimize z= 2)x| —3))| 
ax ify So subject 10 2x + 3y = 0 
0 


minimize 


subject to 2y + 3y 


21. A furniture factory makestwo types of chairs. The first type takes 10 hours of labor to 
make, uses 2 square yards of fabric and 20 pounds of padding. The second type takes 70 
hours of labor, uses 3 square yards of fabric and 10 pounds of padding. The profit on the 
first type is $2 per chair and the profit on the second type is $5 per chair. The resources 
available are 490 hours of labor, 32 yards of fabric, and 240 pounds of padding per day. 
Set up a linear programming problem to decide how many chairs of each type should be 
manufactured per day for maximum profit. 


6.1 Linear Programming Problems 393 


22, A cereal manufacturer wishes to add vitamins A and B to his cereal, which lacks 
them. He has three vitamin mixtures available. Each gram of the first mixture thrown in 
the vat would add | grain of A and 7 grains of B to each serving at a cost of 4.9 cents per 
serving. Each gram of the second mixture would add 2 grains of A and 3 grains of B to 
cach serving at a cost of 3.2 cents per serving. Each gram of the third mixture would add 2 
grains of A and I grain of B to each serving at a cost of 2.4 cents per serving. He Wishes to 
add at least 2 grains of A and 5 grains of B to each serving. Set up a linear programming 
problem to find out how many grams of each mixture should be tossed into the vat 10 
achieve the minimum requirements at lowest cost. 

23. ‘The Natural High Fibre Health Bread Company ships waste sawdust from two saw- 
mills (S,, 5.) to three bakeries (B,, B,, B,). S, produces 10 tons of sawdust per day and S, 
S tons. Bakeries 1, 2, and 3 need at least 2, 5, and 3 tons of sawdust per day to operate, The 
accompanying table gives the cost of shipping a ton of sawdust from sawmill S, to bakery 
B, Set up a linear programming problem to find a shipping schedule that minimizes the 
cost of shipping sawdust while giving each bakery its minimal requirement. (Let x, = the 
amount, in tons, shipped from S, to B,) 


24, For the sawmills and bakeries of exercise 23 assume that any unshipped sawdust 
must be burned at the sawmill at a cost of $2 per ton in air-pollution fines and any 
‘oversupply at the bakeries costs $1 per ton in labor to throw it over the back fence, Modify 
the linear programming problem of exercise 23 10 take these costs into account, 


In exercises 25 through 30 a curve-fitting problem is posed. For each problem find the 
arrays A and B of Proposition 6.2 or 6.3 as appropriate. How many variables and how 
many constraints (other than the positivity conditions) would normally be needed to state 
the problem as a standard maximum problem? 


25, Find the straight line that best fits the data in the J, sense, 
ain PH 
ylao 3 5 


26. Find the straight line that best fits the data in the /, sense. 


394 Linear Programming 


28. Find the function of the form 


fii) = 0x + BY +7 


a) t= 2 eed 
y 43 =! 0 -2 
ei 


that best approximates the data in the /, sense. 
29, Find the function of the form 


fix.y) = 0x? + By + 


that best approximates the data in the /, sense 


30, Find the funetion of the form 


wa/ivn ean Ayer +5 
gta eal Le 
y | Ds & 1 ' 

] o 4 < - 1 1 
w | 1 s -2 i) -! Oo 


that best approximates the data in the /, sense 


In exercises 31 through 35 use the technique of Example 6.8 to set up linear programming 
problems to answer the (occasionally obvious) given question, 


31, (a) Does the system: 
talon 
wu af}=n 
have any solutions with x > 0, y > 0? 


(b) Does the line 


cut the first quadrant? 
32. Does the line 2x + 3y 


4 go through the rectangle 1 <x <2, 


<” 


33, Does the line 2¢ + 3y = 6 go through the square with vertices (—2, 3), (—1.3), 
(=2.2), and (1.2)? 


6.2 Geometry of Linear Programming 395 


34. Does the line of intersection of the planes 


x4+2y-32=1 
2x +4y—72 =0 


‘cut through the positive octant? 
35. Does the line of intersection of the two planes 


x+2y—3 
we + 4y —7: 


1 
0 


Pierce the cube with comers (3, 3, 3), (3, 4, 3), (4, 3, 3), (4, 4,4), (3, 3, 2), (3, 4, 2), (43,2) 
and (44,2)? 


6.2 The Geometry of Linear Programming 


‘This section develops the basic geometrical ideas pertinent to linear programming 
problems. First we record a few technical facts that will be needed later, 


PROPOSITION 6.4 
(a) Let A, B, C be matrices with B-< C. If A > 0, then AB < AC whenever this 
product is defined. Similarly BA < CA whenever the product is defined. 


Let A be a matrix, b a vector, and 7(T) =p + PT, Tin R", a flat. 


(b) Az (T) < b for all Tin R" implies AP = 0. 
(c) Ax(0) <b implies Ax() <b for all T with 7] <e for some ¢ > 0. 


(d) Az(T) < b for all T with |||) <e implies Ax(T) < b for T with 7) = € as 
well 


Pro} 

(a) Me wish to show (AB)|i: /] < (AC)é;/] for all i,j. By Proposition 2,1 it is 
sufficient to consider the case where A, B, C are vectors and the product is the dot 
product. Since B[k] < Ck] and A[k] > 0 for all k, we have A[A]B[K] < A[KICIA] 
and hence 


SAIAIBIAT < S) ALAICTAY 
t t 


(b) From Ax(T) <b we obtain (AP)T < b — Ap, which reduces us to the case 
AT <b for all T in RY If Afi;j) #0. let Tk] =0 for all k 4). Ther 
(AT)i] = Ali: JTL) < B17), which forces either T]/] < bli] + dlésy) or TL) 
bli] = Ali:j}, depending on the sign of Afi; j), Neither restriction is allowed. 


396 Linear Programming 


(c)_ Ife, ware vectors with nonnegative components a =F wand B = ¢ /, then 
uTw = 3 olkWwik] < Sab = nap 
7 T 


Now suppose that 6 > 0 has 8 +16, A is a matrix with a =r/,/A, and Tis a 
vector with r =r/T and n components. Then by the above calculation we will 
have AT <b as long as 7 <B/na, and this will be true if ||| <B/na =<, The 
case of An(T) reduces to this case by replacing AP by A and 6 by b — Ap. 

(d) Suppose that ||7|| =€ but Az(7,) ¢ b. Then there is an index / such that 
bli} < (An(T)) i] = ALE (Ty). Let (7) = 2(T + T,). Then 20) = 7(T), 80 
(=Ali;))2"(0) < bli). By (6) there is a 6 > 0 such that (—Afi;))x'(T) < —bfi] for 
(T<8, which means that bli) <(4x(Ty + T)Mli] for |T\| <5 <e. Bur if 
T, = Ty — (@/e)Ty then bli] <(Ar(T i] with 7) =e — 5 <e, which is a 
contradiction. 


In this section we will assume that our linear programming problem has the 
form 


maximize Paox 


subject to AX <b 69) 


The techniques described in the last section may be used to put any linear 
programming problem in this form. Note particularly that the positivity condition 
X 20 does not appear explicitly in (6,9), If the condition is present, then we 
replace the two conditions 


by the single condition 


The next proposition lies at the root of the geometric approach to linear 
programming (Figure 6,3). 


af atx>b 
atx <b 


FIGURE 6.3 


6.2 Geometry of Linear Programming 397 


Proposition 6.5 Let v be a column vector in R" and define //R" + R by 
X) = a?X. Then : 


1. is perpendicular to the hyperplane /(V) = a7X = const. 
@ points in the direction of increasing /(). 


Proof Fix a constant b. Then a?X = b is one equation in n unknowns and hence 
the solutions form a hyperplane [that is, an (n — 1)-flat] in R" by Proposition 4.17. 
By Proposition 4.23 the subspace S parallel to this flat is @7X =0. Thus a is 
perpendicular to S and hence the component of a parallel to the flat is zero (see 
Figure 5.19 and the accompanying discussion). 

Now suppose that a7X;, = b and we move from, in the direction of a to 
Xy + «a for some small «. Then 


NXg + €a) = aT(Ny + €a) = b + lal? > b 


where € > 0. Thus any movement from X, in the direction of @ increases /(X), 
Similarly any movement in the direction —a decreases /(X), 


A hyperplane u7X = b divides R" into. two hal/-spaces — the half-space 
aX > b and the half-space aT < b (Figure 6.3). 

Using Proposition 6.5, we can dispose of most two-dimensional problems with 
a quick sketch. 


ExamPce 6.10 Consider the linear programming problem. 


maximize = =x +2) 

subject to x $3) < 18 
v¢y<8 
dw+y <4 
x20, y>0 


Each inequality defines a half-space (half-plane in this case). For example, 
x + 3y < IB is (XY) = aX < 18, where a = [1 3}. The half-plane defined is on 
the side of the line a®X’ = 18 opposite the direction that a points. The inequality 
x > 0, on the other hand, is /(X) = a7 > 0, where @ = [1 0]. In this case the 
half-plane is the side of the line a?X = 0 in the direction that @ points. 

The set of points satisfying all five inequalities is the shaded set S in Figure 
6.4. The arrows on each hyperplane (line) point in the direction of increasing 
fiX) = al(X) — that is, are parallel to a. 

The dotted lines in Figure 6.4 are two typical hyperplanes 


z=OX =x 42 


= const. 


398 Linear Programming 


FIGURE 64 


{and the dotted arrows point in the direction of increasing z, From the sketch it is 
clear that the point of S at which z has its largest value is at the intersection of the 
lines x +) =8 and x + 3y = 18 —that is, at the point (3,5), 


Figure 6.4 illustrates several important ideas. 

The first thing to notice ts that the solution to the problem occurred out on the 
edge of the set of points 5 that satisfy the inequalities. The set § is bounded by 
hyperplanes, and the solution lies on two of them. Algebraically, at the solution 
point two of the inequalities are equalitie 

The second observation we wish to make is a good deal subtler. Why does the 

‘YX occur at the intersection of af = x + 3y = 18 
and afX =x +y=5 instead of, say, the intersection of afX and atX 
2x +.y = 147 The answer lies in the slope of the lines z = eTX = const. If these 
lines were steeper, ¢ = [3 2} instead of {1 2} say, then a sketch shows that the 
maximum would occur at the intersection of aX =8 and afX = 14 [Figure 
6,5(a)}, There is an intermediate slope ¢ = [1 1] = a, where the maximum occurs 
along the entire line segment at which afX = 8 intersects the set S. 

Closer inspection shows that the relationship among the vectors ¢, ay, as, and 
dy is the crucial factor. Figure 6.6 shows the two intersection points in greater 
detail with relevant vectors drawn from a common point. The maximum occurs 
when cies between the vectors a, at the intersection. Fore = [1 2].asin Figure 6.4 
the maximum occurs at the first intersection [Figure 6.6(a)], since c is between a, 


6.2 Geometry of Linear Programming 399 


FIGURE 6S 


and as. For ¢ =[3 2] on the other hand, the maximum occurs at the second 
intersection where c is between a, and a, [Figure 6,6(b)]. For the intermediate 
case ¢ = [1 1] = ag ¢ is both “between” a, and a, and “between” ay and ay as 
well. 

The notion of “between” has a simple algebraic characterization. If a,, d,, for 
example, were used to define a new coordinate system, then c in Figure 6.6(a 
would be in the first quadrant. That is, ¢ = aja, + asa, with a 
6,6(b), on the other hand, c is not in the first quadrant of the 


> 0. In Figure 
ordinate system 


400 Linear Programming 


a) (b) 
FIGURE 66 


defined by ay and ay. In fitet © = agdy + ayy With ay <0. In the plane a vector v 
is between two vectors w), and wif v = ay) + ag¥y With a, > 0,ay > 0, Setting 
A =[W|wsh this means that u is between w, and ws if the linear system 


AX =0 


has a solution with X > 0, 
We proceed to generalize Figure 6.6 to higher dimensions. First « bit of 
terminology. 


Derintrion 6.5. Let A be a matrix, The dual cone of A is the set of vectors X for 
which ATX <0. If b is a column vector, the dual cone of b is the set of vectors X 
for which bTX < 0. 

‘Translate the coordinate system in Figure 6.6(a) so that the lines intersect at 
the origin, If A = [a, |ay}, then the matrix equation ATX < 0 is the pair of vector 
equations 


ax<0, aN <0 


and the set S of Figure 6.6(a) is the dual cone of A. The half-plane below the 
dotted line is the dual cone of c. The vector c is between a, and a, if and only if 
the set S is contained in the half-plane below the dotted line [compare Figure 
6.6(b)}. The next proposition generalizes this to higher dimensions. Notice that the 
problem involved is precisely that of Example 6.8. 


6.2 Geometry of Linear Programming 401 


Proposition 6.6 (Farkas’ lemma) Let A be a matrix and ca vector. The equa- 
tion AX = c has a solution X with X > 0 if and only if the dual cone of A is 
contained in the dual cone of c. 


Proof First assume that we have a solution AX = c with X > 0. Let v be any 
vector in the dual cone of A — that is, A’v < 0. Since X > Owe have, by Proposi- 
tion 6.4, 


0 > XTAT = (AX)v = ely 


and hence v is in the dual cone of c. 

We will prove the converse by induction on the number of columns of A. The 
proof is quite simple in concept and may be turned into an algorithm for comput- 
ing positive solutions of AX = c. 

We begin with the induction step. Assume that the proposition is true for all 
matrices A with less than 1 columns and that A has columns, Let 4’ =0 1 4 A. 

Now if the dual cone of A’ is contained in the dual cone of ¢, then we are 
done. For by the induction hypothesis there is an X’ > 0 such that 4’X’ = cand 
hence [0|X’] is a solution of AX’ = ¢ with X > 0. Thus we may assume that the 
dual cone of 4’ is not contained in the dual cone of c. Thus there is a vector v such 
that A7v <0 but cv > 0 (see Figure 6.7, where A = {a |a,), A’ = {a,)). Let P be 
any square matrix whose null space is S, the subspace generated by A[;1] 
(P = 1 — Pg, where Py is perpendicular projection into S, is an obvious cho 
but many other choices are possible). Since the dual cone of A is contained in the 
dual cone of ¢, it follows that the dual cone of PA’ is contained in the dual cone of 
Pe, For suppose that (PA’!w <0 and set a = [st]. Then 


Tp 
AM Phy) = [al A‘ Ptw = |“ ~| 


ATP 


_ f (Paytw i 0 | 
5 ayn) = Ware |<° 


Pay 


[a,)a,1 
Fa) 


FIGURE 67 


402 Linear Programming 


‘Thus PTw is in the dual cone of A and hence the dual cone of c as well. This 
gives the inequality 


0 > ch PTw = (Pe)lw 


which shows that w is in the dual cone of Pc. The induction hypothesis now shows, 
since Pa’ has n — I columns, that there is an X’ > 0 such that 


(PANX' = Pe or P(e = A’X') =0 


Since the null space of P is generated by a, there is a scalar.x such that (cf. Figure 
6.1) —A'X' =xa or [alA'}[x X']7 =c. The vector [x X"]” is the solution 
sought, We must check that x > 0. This follows from the existence of the vector v 
above, In fact multiplying ¢ — 4'X’ = xa by vF gives 


Te — oTA'X' = xoTa 


or 


ly — (ATX 


x(a") 


Now clu >0, ATv <0, and ¥* > 0. Thus x(aTv) > 0. Now atv > 0, for if 
aty <0 then Atv = [a|A'I'v < 0 and, since the dual cone of A is contained in the 
dual cone of ¢, we would have eT < 0, which is not true. Thus a7 > 0 and so 
x20. 

It remains to start the induction. Now the dual cone of A is the intersection of 
the dual cones of the veetors Alsi]. If A has no columns (0 = 1, »A), we take the 
dual cone of A to be all of R", Thus for» = 0 the dual cone of ¢ contains the dual 
cone of A only when ¢ = 0. In this case the solution of AX = c exists and is ,0, 
which we take to be nonnegative. 


In Figure 6.4 the set S is the set of points that satisfy the matrix inequality 
AX < b of (6.9). Some special terminology is associated with this set. 


DEFINITION 6.6 
1. A point satisfying the matrix inequality 


AX <b 


of (6.9) is called a feasible point. 
2. The individual inequalities of (6,9) 


Als] X < bli) 


(dot products) are called constraints. 


6.2 Geometry of Linear Programming 403 


3. Let u be a feasible point, The constraints 
Ali: v < bli] 


for which equality holds are called the active constraints at v. 
4. Leto be a feasible point. The flat obtained by intersecting the hyperplanes 


Ali:|N = bf) 


as / runs through the active constraints at v will be called the constraint flat for v 
and denoted z,. 


Let J be the vector of indices of the active constraints at v. Then m, is the flat 
of solutions of Al/:]X = b[/]. If the columns of V span the null space of A[/;}, then 
=, can be parametrized as 


(7) =0 + YT 


(Proposition 4.23). 

Referring to Figure 6.4, the set of feasible points is the set S. The points in the 
interior of S have no active constraints, This means that the matrix A[/:] has no 
rows (0 = 1 plJ;)) and so every X in R® isa solution: =, is all of Re 

For points on the boundary of S there is at least one active constraint, and at 
the vertices ((0, 0), (0, 6), (3, 5), (6, 2) and (7, 0)) there are tWo active constraints. 

At (3, 5) the active constraints are 


x + 3y 
x+y = 


and the unique solution of this system is (3, 5). The constraint flat of (3, 5) is just 
(3,5) itself. (In this case V has no columns; that is, 0 = Lip ¥)- 

In general we define a vertex to be a feasible point whose constraint flat is 
itself — that is, has dimension equal to 0. 

Any point on the line segment from (3, 5) to (6, 2), endpoints excluded, has 
the single active constraint 


x+y 


and this line is the constraint flat. It may be parametrized 


nia 
lal 
for any v on this line segment, 
‘An important property of the feasible set of a linear programm 


adi) = 


1g problem i 


404° Linear Programming 


(a) (b) 


FIGURE 6.8 


convexity. Figures 6.8(a) and (b) give examples of sets that are convex and 
nonconyex, respectively. 


DerINiT10N 6.7 A set Sin R" is convex if for every pair of points p, qin S the line 
segment 


Mi=p+iq—p, O<1<1 
lies in S. 
PROPOSITION 6.7. The set of solutions of a system of linear inequalities is convex. 
Proof Let the system be AN <b and suppose p. q are solutions, Let 

Mh =p+tq—p)=(—op+q. O0<1<1 
Then | —1>0 and ¢ > 0; thus 

AK) = (1 —DAp + Ag <(1—Ob+th=b ow 

Referring again to Figure 6.4, the points without active constraints are character- 
ized by their being surrounded on all sides by feasible points, The points on the 
edges, on the other hand, are not completely surrounded by feasible points. If 
such a point is not a vertex, however, it does lie within a line segment of feasible 
points, Vertices do not even lie within a line segment of feasible points (they do 


Appear as endpoints of such segments). The next proposition is a technical expres- 
sion of this idea, 


Proposition 6.8 Let v be a feasible point of a system of linear inequalities with 
constraint flat z,(7) =v + VT. There is a number ¢ >0 such that z,(7) is a 
feasible point with the same active constraints as v for Tl) <e. 


2 Geometry of Linear Programming — 405 


Proof et the index vectors J and J give the active and inactive constraints, 
Fespectively. By the definition of z_ we have A[/:}r,(7) = bL/] for all 7; and since 
Al:}7(0) < bl J}, Proposition 6.4(c) shows that AlJ:}e(T) < b[ J] for ||) <e for 
somee>0. 


For the most part the solutions of linear programming problems occur at 
vertices. A proof of this is complicated by the fact that general linear program- 
ming problems need not have solutions and their feasible sets need not have 
vertices (e.g., maximize z = y subject to x <0 in R2). The next two propositions 
show the general situation 


Proposition 6.9 Let S be the set of solutions of a system of linear inequalities 
AX <b 


(a) If S contains an entire flat 7, then every constraint flat 7, contains the paral- 
lel translate of = through v, and this translate is contained in S 


(b) Let © be a point of S such that the dimension of x, is minimal among 
constraint flats. Then z, lies entirely in S, 


Proof Suppose that the flat 7(7) = p + PT lies entirely in S. Then dn(T) <b 
for all T and hence AP = 0 by Proposition 6.4(b). Let v be any point of S, The 
translate of z through v may be parametrized as (7) =u + PT (Proposition 
4.23), and so the computation 


Alu + PT) = Av + APT = Av +0 <b 


shows =/(T) feasible for all T. Similarly, if the index vector 1 gives the active 
constraints at 0, then A[/:Je = b[7] and so AU:Ku + PT) = ALts|o = UUs] shows 
that v + PT satisfies the equations defining 7, 


Now assume that the constraint flat 7,(7) =v + PT has minimal dimensions 
for v in S. If there is a parameter vector J such that p = 7,(7,) is not in S, then 
the set of real numbers r for which the line segment =(/) = 0 + ((p — 0 


v + P(Tp) contains only feasible points for |t| < ris nonempty by Proposition 6. 
and bounded above by any r > I. If ry is the least upper bound of this set of real 
numbers. then by Proposition 6.4(d) the points (-r,) are feasible. Since S is 
convex, we have, by changing the sign of r if necessary, that g = (7) is feasible, 
but a(% +4) is not for 5 >0. Now if =, then by Proposition 6.8 
a(T) =4 + VT =v + (rly +7) is feasible for 7) <e, ¢>0. Taking 
$7, for 8 > 0 sufficiently small, we see that this is impossible. Thus «, # 7, 
Since q is contained in z,, however, we have =, contained in but not equal to 
and hence z, does not have minimal dimension. This contradiction shows that 
does not exist and hence z, is contained in S. 


Prorostti0N 6.10 Suppose that the linear programming problem (6.9) has solu- 
tions. Then the set of solutions alway’ includes a point v for which the constraint 


406 Linear Programming 


flat 7, has minimal dimension, For any solution point v, 7, is parallel to the 


hyperplanes TY = const. 


Proof Suppose that S is the feasible set of (6.9) and that the maximum value 
attained by /(X) = cTX is 0, Then the set of solutions is the intersection S’ of S 
with the hyperplane cTX = 0. Hence Sis the solution set to the system of linear 
inequalities obtained by adding the constraints "XY <@ and —cTX < -0 to 
AX <b. 

Let the point v in S’ have constraint flat 7, when considered to be a point 
of S. By Proposition 6.8 there is an ¢ > 0 such that 7,(7) lies in S for \T\| <e. 
In fact, (7) must lie in S’ for |T\|<e, for chx,(=T) = cv = VT) = 
ity + (c°V)T = 0» (cPV)T will be greater than @ for the proper choice of sign 
unless eV = 0 — that is, =, is parallel to chX’ = const 

Thus the constraint flat 7, is the same if v is considered a point of S or a point 
ofS’, Let m,, 0 in S”, have minimal dimension. Applying Proposition 6.9, we have 
that =, lies entirely in S”, hence entirely in S, and hence every constraint flat of S 
contains a translate of x,. Thus =, has minimal dimension among the constraint 
flats of S. 


Now let us return to the idea (Figure 6,6 and the accompanying discussion) 
that of a solution of (6.9) the vector ¢ is a positive linear combination of the 
vectors associated with the active constraints. Assume that v is a solution and the 
index vector / gives the active constraints at v. We should have a solution, then, of 
the system 


ALLEY y>0 


We can drop reference to J by assuming that the components of ¥ corresponding 
to inactive constraints are zero. Then we have a solution of 


ATY = y>0 


Note that if w is any solution of this latter system and s is any feasible point, then 


fis) = hs = (AT Ws = wTAs < wT = Tw 


where the inequality is justified by Proposition 6,4(a). Thus btw is an upper bound 
on the values of f(X) = PX. 

This leads to the next proposition — which shows, incidentally, that any algo- 
rithm that can find a point of a feasible set can in fact solve linear programming 
problems. 


Proposition 6.11 The vector v is a solution of 


maximize z2=0% 


subject to AX <b (69) 


6.2 Geometry of Linear Programming 407 


if and only if there is a vector w such that [X) ¥] = 
system of linear inequalities 


|w] is @ solution of the 


AX <b 
AY¥=c Y>0 (6.10) 
ox =ory 


In this case the components of w corresponding to inactive constraints at & are 
2et0 [ie., 0 = Ws (b — Av)} 


Proof Given |v) w] a solution of (6.10), we saw above that f(v) = 
fo) is maximal and v is a solution of (6.9), 
Since 


» < bh w, so 


wth = btw = 


Ty = (ATw)"v = wl(Av) 


we have wb — Av) = 0. Since w > 0 and b — Av > 0, this dot product is a sum 
‘of positive terms, which must then be individually zero —that is, 0 = 
wa(b — Av) 

Conversely, suppose that v is a solution of (6,9). Let the index vector / give the 
active constraints at v and let the index vector J give the inactive constraints at v. 
Since we are assuming that a solution exists, it follows from Proposition 6.10 that 
Tis not empty — that is, there are active constraints, Now v is the maximal value 
‘of TX on the larger set, S;, of solutions of A[/:]\ < b[/]. For suppose that vis 
in S$, and cTo’ >ecTv. For the line segment Kn) = (1 — Ov +’ we have 
eFN(t) = cTv + eT’ — cM) > cTe for all 1 > 0. But (7) lies in S, by Proposition 
6.7. So AU; MMt) < bl] for 0 <1 < 1. Now A[J;\(0) < b|/] implies that for e sut- 
ficiently small, A[J:\(e) < 6[/], But this means that for some small ¢ > 0, 
Alle) <b and Te) > cTv, which means that cfe is not maximal, This contradic- 
tion shows that we cannot have ew’ < cTv, 

Now if v’ is any vector such that A[/:\u’ — v) <0, then A[/;]e’ < Al/sJo = b 
and hence Tv’ < eTy — that is, e"(v' — v) < 0. This shows that the dual cone of 
AUF’ = AT [1] is contained in the dual cone of ¢ and hence by Parkas’ lemma 
(Proposition 6.6) applied to A[;/] there is a vector w, > 0 such that A"(:/}, = 
If we define w by w[/] =. wlJ]=0. then ATw = c and w > 0. Further if 
— Ao, then e[/] = 0 and hence eTw = 0. Thus 


bTw = (Au + effw = (Av)w + eTw 


Exampe 6.11 Consider again the system of Example 6.10: 


maximize 


408 Linear Programming 


es Is 
1 I 8 
subject to 2 ilx<|ia 
Nai 0 
0 =I 0. 


We have seen (Figure 6.4) that the solution to this problem is = (3,5). 
The normals to the active constraints at v are just [I 3] and [1 I] and the vector ¢ 
lies between these normals (Figure 6.6(a)}. By Farkas’ lemma there will be a 


nonnegative solution of 
1 fa 
[3 nla 


and indeed 


a 1 iptyt 1 

A 7 [: 1 [|= ali 
The vector w = ¥ of Proposition 6.9 is just this vector padded to the proper length 
with zeros, one zero for each inactive constraint 


GilGl=b 11“ -lfa] = 


For this v and w we have 


= Ip] = B 
5. 


bw =HI8 8 14 0 Of ]=¥=13 


so indeed eX = bY, This equation, ch’ = bP, serves to rule out such solutions 
of ATY = 6, ¥>0,as ¥=4{3 0 1 0 0} Geometrically this latter solution 
comes from the point of intersection of the first constraint (x + 18) and the 
third constraint (2x + y = 14) where the dual-cone condition of Farkas’ lemma is 
satisfied but that falls outside the feasible set S. For this value of ¥ we have 


6.2 Geometry of Linear Programming 409 


bY =(I8 8 14 0 O)3]}}=% = 136>13 
0 
1 
0 
0 


which illustrates the statement from the proof of Proposition 6.9 that 
AX <b and ATY>6, Y>O, implies bTY > chX . 
EXAMPLE 6.12 


minimize 
subject to 


This is a minimum problem but not a standard one, We rewrite it as a maximum, 
problem: 


maximize -u=-x-y 
subject to =x —3y <3 
dv-y <2 
=x ty <2 


Since 
x +3y =—3 goes through (0, —1) and (—3,0) 
2x —y=2 goes through (0, —2) and (1,0) 
=x +y=2 goes through (0,2) and (—2,0) 


the feasible set is the shaded triangle in Figure 6.9, The direction of increasing = 
is (Proposition 6.5) ¢ = (—1, —1). Thus the solution would appear to be the inter- 
section of the lines 


At this point we have 
os allel=[51] 


Sane 


or 


[il 


410 Linear Programming 


FIGURE 6.9 


which is indeed positive. The solution of the problem is the point of intersection of 


ele! APB) =0] 


The vector w of Proposition 6.9 is (a, #) padded with a zero for the inactive 
constraint 2x — y < 2. so w = (1,0, 1). The value of the objective function is 


2 24 
0 
4 


or 


{ there was a close connection between the solutions 
tion between the two follows from Proposition 6.11. 


In Section 6.1 we said t 
to dual problems. The rel 


Prorosition 6.12 Consider the dual problems 


maximize 2=cTX {minimize w = bTY 
subject 0 AN <b subject to ATY > € 
x>0 y>o 


If is a feasible point for the maximum problem and w is a feasible point for 
the minimum problem, then 


fo < Bu 


with equality if and only if v and w are solutions of their respective problems. 


6.2 Geometry of Linear Programming 411 


In this case the components of w corresponding to the inactive constraints 
(exclusive of positivity constraints) at v are zero and the components of v corre- 
sponding to the inactive constraints (exclusive of positivity constraints) at w are 
zer0. 


Proof Suppose that u and v are both feasible. Then u > 0, v > 0, Au > ¢, and 
(Av)? < 67. Hence, using Proposition 6.4, 
Tu > (AvyTu 


TAT > vlc = clu 


Applying Proposition 6.11 to the problem of maximizing z = cX subject to 


A b 
|x <[2 
[4)*<[] 
We find that vis a solution of the maximum problem ifand only if there is a vector 
Y =[u, uw] > 0 such that 


EST eens 


Uy Uy 


and cfu = [6 Ollu? uf]? = b7u,. Thus suppose that wand v are feasible and that 
chy = bu. Then set 1, 1d uy = Au — ¢ > 0, and it follows that v 1 4 solu 
tion of the maximum problem. Further, the components of ¥ = (uf uff corre- 
sponding to the inactive constraints at v are zero, Since the components of ¥ that 
are components of us correspond to the positivity constraints, the components 
of u that correspond to inactive constraints exclusive of positivity constraints are 
zero. 

To prove the corresponding statement, interchanging the roles of w and v, 
write the minimum problem as the maximum problem: 


maximize —bTy 


subject to 


Then the dual problem is 


minimize 
subjectto = AX > —b 
‘ x20 
and the roles of u and v have been interchanged! 


AMPLE 6.13 Writing the system of Example 6.10 as a standard maximum 
problem, we have the dual problems: 


412 Linear Programming 


=[I8 8 14]Y 


maximize z=(1 2X minimize 
subject to [1 3 18] | subject to [! 1 ro[}] 
1 ilx<]8| B01) =z 

24 14) 
x>0 y>0 


Here v = (3, 5) and u = (4, 4,0) — see Example 6.11, The zero component of 
u corresponds to the inactive constraint 2x + y < 14, and since both components 
of v are nonzero, both constraints of the minimum problem are active atu. 


fy = [1 Ip] =I3=[I8 8 14]ff]=6% 
5 4 
0 
Proposition 6,12 may be rephrased as follows. Solving the dual problems of 


the proposition is equivalent to finding X and ¥ such that 


AX Sb, ATY Se 
PN = bY 
X20, Y>0 


We may change the inequalities to equalities by introducing more variables. De- 
fine the unknown vectors X’ and Y’ by 


AX +X" ay —y’ 


Then X’ > 0, ¥’ > 0, The components of X" are called slack variables and the 
components of ¥’ surplus variables because they measure the slack and surplus, 
respectively, in the constraints. Using these slack and surplus variables, we may 
restate Proposition 6.12 as 


Proposition 6.13 Solving the dual problems 


maximize z=c™X {minimize w = bTY 
subject to AX <b subject to ATY > c 
x>0 y>0 


is equivalent to finding a solution (X, X’, ¥, ¥’) of the system 


6.2 Geometry of Linear Programming 413 


A 10 OVX b 
= 0 BF 01) x") =10 


0 o at -i\l¥ ¢ en 
ye 
If (XX, Y, ¥) is a solution, then0=XN x YLO=N'x Yo 
The componentwise products in the conclusion of Proposition 6.13 give the 
statements in Proposition 6.12 involving zero components and inactive con- 


straints, 


Exampte 6.14 Continuing Example 6.13, we found the solution X = (3,5), 
Y¥ =(4,4,0). From (6.11) we have AX +X’ = hand hence 


18 1 IE] 0 
=b-Ax=| 8]—]1 IflsJ=]o 
J i211 3 


The zero components flag the active constraints, Similarly 

=[; 1 ilk =f) 

Pia Ta) =[ 
0. 


and indedO=XXYLO=N' XY 


Peary 


Proposition 6.13 shows that every linear programming problem may be re- 
duced to solving a problem of the form 


A 


x>0 


We saw in Example 6.8, on the other hand, that every such problem may be 
expressed as a linear programming problem, The two types of problems are thus 
coextensive. 

The proof of Farkas’ lemma gives one approach to the solution of such prob- 
lems. Another approach, based on Proposition 6.12, is known as Khachiyan's 
algorithm. This algorithm, which will find a point in a feasible set in a finite 
number of steps, caused some excitement in 1979 because theoretical considera- 
tions indicated that, for a large number of unknowns and constraints, it should be 
Jess costly than current methods. Further investigation showed, however. that the 
simplex method, discussed in the next section, remains superior for now. The 
simplex method works with a compact version of Equation (6.11) 


414 Linear Programming 


EXERCISES 6.2 


In exercises | through 5 a function /(X) and a system of inequalities is given. Sketch the 
Feasible set of the system of inequalities and identify the vertices. By applying Proposition. 
6.10 10 f(X) and —/(X) find the maximum and minimum values (if they exist) of the 
function on the set. 


ho fXy sx +5) 2 fN)=5x 45) 3. fit) =4y 
—x+3y<5 —x+4ay <9 3x + 2y< 10 
ways ax—y <9 —x+3y <8 
x+y2~ xeyol xt¢ys4 
ax-y<ul 
4, f(X) = 3y 5. f(X) = -2x —2y 
Wye 8 By + Oy < 18 
Bete -x+3y<4 
Bet ay > ax-ysd 
=x +4 2-6 dea <5 
x+y 20 
arty 2-2 


In exercises 6 through 10 function = = e*X to be maximized is given and a set of 
constraints is specified, From i sketch find all vectors v of Proposition 6,11. Then solve the 
system (6.10) for each & 10 find all possible vectors w 


6, ==. + 5y, the constraints of exercise | 
1 =x = y, the consteaints of exercise 2 

8, = =4y, the constraints of exercise 3 

9 —3y, the constraints of exercise 4, 

10. = 2N + 2y, the constraints of exercise 5, 
In exercises 11 through 14 4 minimum problem is given. Restate as a standard minimum 


problem if necessary and then state the dual problem. Use a sketch to find the solution X 
of the dual problem and then use Proposition 6.13 (0 find X“, ¥ and ¥* and thus solve the 
problem 


1, Minimize w= 150, + 10y 42 12 Minimize 
subject toy, + 2p + yy > 3 subject to 
W+y2—My24 
YedeNs Yo Va 
13. Minimize w= 20), +r, +125, 14. Minimize Wt 
subject toy, + yp +2), 23 subject 0 y, ys +9, 0 
4) ye ty 22 +3 — 9,24 
Yuva ¥s 20 Yy Nas 20 


15, Minimize w = y,— 205 — 
subject 10 —y + yp +) > —2 
Yi —Ye—Jp 22 
16. (Computer assignment) The APL functions PSLN and Poso!v below are an imple- 


mentation of the proof of Farkas’ lemma. Use them and Proposition 6.13 to check your 
answers to exercises LL and 12. 


6.3 The Simplex Algorithm 415 


¥ ZS PSLN A iN:P:SOLN:V;I:X 
TU) ZACH 1 411A) 1) 00 
[2] =(SA.=S+A N41) 70 
13] 20 090 
(4) (06m) 70 
[5] Peps xuR-PERP AL.1} 
(6) ZS PSLN P+. x0 114 
(7) s-S0tnet=tipzy/J 
18) V=(0 14Aye xz, EN }-9 
(9) CCA cte(SeSHAL TINY 
(10) -(S5S+X-VE0; 1]-AL 1511) /END 
111] 2-$ PSLN 0 114 
(12) SOLNA1=1 pz 
113] 4x0 
(14) END Z-(W.SOLN) 0X, 1112 
v 


¥ Z-€ POSOIV A 
(1) 26,1). 1A, C)PSLN A.C 


The function PERP is from Section 5.5. The variable SOLN Is 1 when a solution exists and 
zero otherwise. To compute a solution of 


AX 


Go x20 


use the expression %-¢ POSOIV A 
17, The solution of the problem 


minimize w = [2x + 1) 
subject to x 50 


is clearly & =O, =1 

(a) Use Proposition 6.2 to set this problem up as a standard minimum problem in 
the variables x, d*, d 

(b) Solve the dual problem graphically 

(©) Use Proposition 6.13 to find the value of x, d*, d 


6.3 The Simplex Algorithm 


In practice linear programming problems often involve a large number of varia- 
bles and constraints (several thousand). These problems are run using large pro- 
gramming packages too sophisticated to analyze here. We can, however, describe 
the algorithm that these packages are based upon. 

In this section we will restrict our attention to the dual problems of Proposi- 
tion 6.13 of the last section and the equivalent equation (6.11). 


416 Linear Programming 


We begin with the observation that since the feasible points, 5, of a standard 
maximum problem satisfy X > 0 for(—1)X < 0] Proposition 6.4(b) shows that S 
cannot contain a whole line. Thus if S is not empty, it has vertices, and hence 
(Proposition 6.10) the solution, if there is one, must occur at a vertex. The simplex 
algorithm moves from vertex to vertex seeking a solution. To get started, however, 
the algorithm needs to be given a vertex. 


Proposition 6.14 The solutions, if any, of the standard maximum problem 


maximize eax 
subjectto AX <b, X>0 


occur at vertices, If b > 0, then X =O isa vertex. 


‘The simplex algorithm deals with compact versions of Equation (6.11) called 
Jableaus that correspond to yertices. A tableau is a matrix of the form 


Aj, Ay <Z Litt 
le dt 4 (6.12) 


obtained by applying row operations to the matrix 


haan et (6.13) 


A 1b | 
extracted from Equation (6.11). 

Let v be a feasible point of the standard maximum problem (6.1) and 
v! = b — Av the vector of values of the slack variables. Notice that the zero com- 
ponents of the vector (v, v') flag the active constraints at v. The zeros of v flag the 
Active constraints of the form x, =0 and the zeros of v’ flag the other active 
constraints, those for which a slack variable is zero. If the constraint flat at v is 
aT) =0 + PT, we set 7,(T) = b — Az,(T), and the vector (=,(T), 7;(T)) has 
the same pattern of zeros as the vector (0, v’) by Proposition 6.8. 

Now let the index vector J give the nonzero components of (v, v") and set 
PT) = (a(T).2(T)[J} Then p(T) is a flat of solutions of (A | N)fJ}Z = b. In 
fact, p(T) gives all solutions of this equation (exercise 23). Since the flats p(T) and 
(7) have the same dimension (the rank of P), it follows that v is a vertex if and 
only if p(7) is a single point, which is equivalent to (A |/)[;J] having independent 
columns, 

Not every set of independent columns of (4| ) gives rise to a vertex, however 
((A|D[-/}Z = b may have no feasible solutions}, and in order to easily recognize 
vertices we impose some restrictions on the standard maximum problem (6.1). 

Let A be m-by-n; then (4 1) is m-by-(n + m). Since the feasible set is in R". a 
vertex will in general have n active constraints, hence J will have n + m — n= m 
components. In this case (A | /)[;/] would be an invertible matrix. If there are extra 


63 The Simplex Algorithm —4N7 


(redundant) active constraints at the vertex, however, then J will have fewer than 
‘m components and (4|/)[:J] will not be square. The simplex algorithm proceeds 
‘on the assumption that this situation does not arise — and in fact it is not often a 
problem in practice, 

So a vertex gives m independent columns of (A|/), But not every set 
B =(A|1)f3J] of m independent columns defines a vertex of (6.1). If we set 
(v, uJ] = B-tb and set the rest of the components of (v,v’) to zero, then v is a 
vertex if and only if (v, 0’) > 0. 

Thus: Vertices of (6.1) without redundant active constraints correspond to sets 
of m independent columns B = (A\I){;J]\ for which B-%b > 0. 

If v isa vertex without redundant active constraints, then a tableau for v, 
(6.12), is obtained by row-reducing B = (A|/)[;/] in (6.13) to an identity matrix. 
‘The product of the elementary matrices for this row reduction is of the form 


BNO 
T (6.14) 
weld 


where the (w"|1) appears because (6.13) has one more row than (|), Muliply- 
ing (6,13) by (6.14) gives 


ae B! Bb 


5 
WA — eT wh wth (6:18), 


Thus d, = ATw — c, 


B! oy, A 1 6 
wr allie 0 olb | 


= btw, and Z = Bb. On the other hand, 


een erie 


or 
lol=[oem — criox.l 
0} = bwra — (ero) 
Thus 
aw = [5] or w= (AT) ‘[piea 
and hence, since inverse commutes with transpose, 


bw 


wnco-y [ft = vos |v 


=(erjoyai[® ]iva= ty 


418 Linear Programming 


Thus, setting w’ = d, = ATw — c, we have that (X, XY, ¥") = (e. u's ww) satis 
fies all of Equation (6.11) except possibly (YY) > 0, and the tableau (6.12) is 


[ras 


wow che 


+ Ab4gy)=1 (6.16) 


Notice that eT = = is the value of the objective function at the vertex v. Thus a 
Lableau with (w, #7 > 0 gives a solution of the dual problems of Proposition 6.13. 

In practice no row interchanges are used in manipulating tableaus. This does 
not affect the computations above, since the components of J were not assumed to 
be in any given order (see Example 6.16 below). 


Examrit 6.18. Consider the problem of Example 6.10 (Figure 6.4). The matrix 
(6.12) is 


310018 
8 
1001.4 (6.17) 


-! 2000 06 


This matrix is also a tableau (6.13) for the vertex v = (0,0). The independent 
columns are J = 3 4 5, so (v.03 4 5] = 18 § 14 and the other components 
are zero: that is, (0,0), & = (IS, 8, 14). Taking w = (0,0,0), we have w* 
(<1, -2) and ctv Aw 

The second, third, and fourth columns are also independent. If we pivot on 
the 1:2 entry and then set it to 1, we obtain 


41) 90 0) 6 
¥0/-3 1 0] 2 
4 0|-j 01/8 (6.18) 


This time (4, |4.)i2 4 
Vertex. w = (j.0.0), w’ = (—4.0) $0, and ey = 

If we had pivoted instead on the : 
had (v.e') > 0, and » would pot be feasible. 


DEFINITION 6.8 In a tableau (6.12) there is a vector J of m indices such that 
(Ay )Ag)fod] = 1D m. The vector (v. v7) with (v, [J] = 2 > 0 and other compo- 
nents zero is called a basic feasible solution. The columns given by J are called 
basic columns and the corresponding variables basic variables. 


63 The Simplex Algorithm 419 


THE SIMPLEX METHOD 


The simplex method is based on the observation that given the tableau for a 
vertex, it is easy to find an adjacent vertex that is closer to a solution. 

Assuming that there are no redundant, active constraints at the vertex vv is 
the intersection of hyperplanes 7, -.., 7, the (irredundant) active constraints, 
The intersection of any n — 1 of these hyperplanes is a line. So, dropping one 
hyperplane at a time, we have n — | lines intersecting at v, The feasible points 
along these lines form m — 1 edges — line segments between vertices (Figure 6.4). 

Now suppose that the vertices w and v are connected by an edge. If J, and J, 
are the indices for the basic columns in tableaus for w and v, then since they have 
n — U hyperplanes defining the edge in common, the vectors J, and J, have n — | 
components in common. Thus we may move from u to v or back by cleaning up 
one column in the corresponding tableau. 


EXAMPLE 6.16 In Example 6.15 we obtained a tableau (6.18) for the vertex 
u = (0,6) of the feasible set of Figure 6.4. The vertex v = (3,5) is connected to 
(0,6) by an edge. The vector J, =(2,4,5), and since v’ = — Av we have 
(v, v') = 3, 5,0, 0,3), which shows J, = (1, 2,5), Thus to move from u to v we 
should clean up the first column, leaving the second and fifth columns unchanged. 
To do this we pivot on the 2:1 entry of (6.18), obtaining 


Ot =4e tO) es: 
1o| 3 4 0)3 ‘ 
0 o}=2 -§ 1) 3 (6.19) 
ool s 4 0l13 


Strictly construed, this tableau gives J, = (2, 1,5) and (v, v')[2_ 1 5] = (5,3.3),s0 
we have (v, v') = (3, 5, 0,0, 3) as we should. [To get J, = (1, 2,5) we must inter- 
change rows | and 2.] From the last row of (6.19) we see that w = (5.4.0), 


(0,0) is a solution of the dual problem, and b%w = ctv = 13.» 


In seeking solutions, our situation is somewhat different from that in the 
above examples. We will know one vertex, v, and we wish to find another vertex 
closer to a solution. From the above discussion we know that we should be able to 
reach any of the adjacent vertices by cleaning up a column of a tableau for v. 
But which entry should we use for a pivot? Not any entry will do. 

For a valid tableau we must have Z > 0 in (6.12), Suppose we pivot on a): 


(6.20) 


420 Linear Programming 


‘Then in the last column we have three cases 


v, is replaced by v,/a, 
v, is replaced by v, — awv,/a, for j #7 
s replaced by z — 5u,/a, 


Since we are starting with a valid tableau, v, > 0 for all j. Thus if a, <0, then 
v,/a, <0 and we do not have a valid tableau. Thus, we need a, >0. For a valid 
tableau we also need v, — ay,/a, > 0 oF ¥,/a, > v,/a,. So a, must give the mini- 
mum ratio v)/a, as j varies. 

Last, we would like to move closer to a solution. Since negative entries of 
{df df] in (6.12) show that we do not have a solution, we may as well take a 
column of (6.20) with 3 <0, since 5 will be replaced by 0. If we do this, notice that 
2, the value of the objective function, becomes z — 6v,/a > z, so we do move 
closer to a solution. 

We will now state the simplex algorithm, in words and APL. 


Statement of the Simplex Algorithm 


A, A, S| 2 
ee i (6.21) 


is the tableau of a vertex. Suppose that T is m-by-m 


1. Let 7{;/] be the column of T containing the most negative ei 
This is the pivot column: Jar(m: jor 1k TEN: |. 

If there are no negative entries, stop: ~(TIM-J}-0)/0 

2, Let P be the vector of row indices for which T{P;J] is positive: 
Pe(OeTL id} ) 1 iMe 

If there are no such entries, then the problem has no solution, stop: =(0=»P) 0. 


ry of (df df 


3. Divide the positive entries of column J into the corresponding entries of the 
last column. Let the /th row give the minimum result: 1-p( Si. /S-=/T(P:N.J 11. 
4, Clean up the jth column by pivoting on the i entry and return to step 1: 


+L, .Te(),J)LOR(I.¥)PiVvoT T. (The functions Lon and P/ vor are defined in 
Section 3,2.) 


The fact that an empty vector P in step 2 implies that the problem has no 
solution is left as exercise 24, 


EXAMPLE 6.17 We will use the simplex algorithm to solve the standard maximum 
problem: 


maximize = = 35x, + 20x, 
subject to: xy + xy < 70 
Xy + 2xy < 80 


63 The Simplex Algorithm 421 
3x, + x, < 140 
XX. 20 


of Example 6.1. 
Since b > 0, the origin (x,, x3) 


(0,0) is a vertex with tableau (6.13): 


1 11 0 0} 90 
1 2/0 1 0} 80 
3. 1/0 0 1] 140 


-35 -W10 00) 0 


The most negative entry in the last row is and 70 80 140+ 113 is 
70 80 46.7, and so we pivot on the 3. The result is the tableau: 


0 gflo -3) x 
0 gjo 1 -3| 
1 4fo oy] 


0 -¥10 0 Ysa 


The most negative entry in the last row is 


He ayes § 44 is 35 20 140 


we pivot on j. The result is the tableau: 


0 0/1 -% =% | 10 
Ono ¢ -—$} 20 
1 ojo -, g&| 40 


00:0 5 10 1,800 


Thus we have the solution x, = 40, x, = 20. The slack variables are x; = 10, 
x4 = x4 = 0. Thus the farmer should plant 40 acres with corn and 20 acres with 
soybeans. This uses all the available labor, all the available capital, and leaves 10 
acres unplanted. The profit is $1800, 

The solution to the dual problem is yj, = 0. yp = 5. 9 = 10. ¥ 0 
Thus the nutritionist of Example 6.2 should use 5 grams of the second mixture 
and 10 grams of the third to give precisely 35 units of caleium (vj = 0) and 20 
units of iron (3 =0) per batch for a cost of 1800 cents = $18. 


THE FUNCTION MAX 


To write an APL function implementing the simplex algorithm is not difficult. The 
function MAX below, which takes a tableau for input and returns a final tableau, 


422, Linear Programming 


incorporates two changes in the above description of the simplex algorithm. First, 
the tests in steps | and 2 are made relative to the largest magnitude in the original 
tableau (see the line wax(2) below for the redefinition of the tableau A). Second. 
the function stops if the maximum has not been found after ten iterations. Al- 
though it seems to occur rarely in practice, the simplex algorithm can go into an 
infinite loop; this is called cycling. From our discussion, this can happen only at a 
vertex with a redundant active constraint. From the discussion that follows (6.20) 
We see that it can happen only if v, = 0 for some i, That is, the vector Z of (6.20) 
has a zero component. If MAX returns a tableau with negative components in dy 
or dy, check Z for zero components. If none are present, give the returned tableau 
back to MAN for another ten iterations. 


WToMAX A MONT SiK.PLS 


TV} Rex (Ma (pA) EVI) *Ne=( 9A) (21 
[2] Anf /,1 Toa 

(9) UedetiMi lec 147 IMG} 

14) s(AcAN TIME 1) (0 

15] =(OnpP(AcAMT (Mi d})/1M) 00 


{8} feP{Sst /Se=T(P:N,J] 
(7) THChJ)LORCY.s)PIVOT T 
[B} (1 0>Ke Ket) /L 


FINDING A VERTEX 
In Example 6.17 the origin (x,y) = (0,0) was a vertex, The reason is that the 
vector b in Equation (6.1) was positive. If b has negative components, then there 
may be no obvious vertex to start with, 

The way out of this dilemma is to notice that the problem of finding a vertex 
in such a case can be rephrased as another linear programming problem — one with 
an obvious vertex, 


EXAMPLE 6.18 


maximize 2 = 2x, + 4x, 
subject to —2x, + 3xy <7 
3x, $y < 17 

=x, — 4%, = 13 
XpXy >0 


Since b = (7, 17, ~13) $0, the origin is not a vertex. 
We introduce a new variable x”, called an artificial variable, into the last 
equation and consider the new problem: 


63 The Simplex Algorithm 423 


maximize st 

subject to =2x, + 3X, <7 
3x, $2 <17 

3" = x, — 4x, < —13 


=x’ 


Herne 


Then (13,0, 0) is a feasible point of the new problem, and since three inde- 
pendent constraints (x, = 0, x, =0, —x, — 4x, — x” = —13) are active at this 
point, it is a vertex. If we can find a vertex of this latter problem for which 
== —x" = 0(note —x” < 0 always), we will have a feasible point of the original 
system. Further, the first column will nor be a basic column (x” = 0), so the 
solution to the second problem will give a vertex of the first problem. 

No solution to the second problem, on the other hand, would imply that the 
feasible set of the first problem was empty. 

We begin with the array (6.13) for the second problem 


T 
Opies: oe) Ke coh a7 
Pa We COOMA tg? 
f; 7 Se 0) FOL 4) Agi 
fo) %o! toh ipl <0) ‘0! 


Here Z =(13,0,0,7, 17,0), soJ = 14 5. 

To get a tableau for this vertex we must row-reduce 7'[;/] to the identity — or 
rather, some permutation of the columns of T[;/] should be row-reduced to the 
identity augmented by a last row of zeros, It is sufficient to clean up the first 
column. 


Ted 1 LOR 3.1 PIVOT T 
9.000 -2.000 3 000 
9.000 3.000 1 000 
1.009 1.000 4.000 
0.000 1.000 ~4 000 


000 0.000 0000 7.000 
000 1000 0.000 17 000 
000 0.000 1000 13 000 
000 0.000 1 000 13 000, 


This is a starting tableau for the simplex method. A final tableau is 


sromax 
1a2e-1 -8.67E-19 1.0060 9 096-2 0 00€0 1 aes 3. 0060 
10060 «0 00ED «5.206 18 1 ODEO 1000 © 1 00eo 11081 
27361 1.0060 1 BE 18 -3.64€-1 0.000 2.7361 1.0060 
10060 0-00€0 «0: 00ED «= 0D 8 one ©0000 00D 

3.x, =0, 


The maximum value is — 
x3 =I, 0 — that is, the intersection of the first and third constraints 
=2x, + ae =7and x, + 4x, = 13. 


424 Linear Programming 


To get the array (6.13) for this vertex, we first discard the first column of F and 
then insert [—c” 0 0} into the last row. 


S-0 14F 
S[4.l-et2 4 
s 
8.673°19 1, 00E0 9096-2 0.0060 1 82E-1 3.0060 
0, 00€0 5.20618 1 00E0 1.0060 1.000 1.1081 
1 000 1.89618 ~3 646-1 0 00€0 2.73E°1 1 00EO 
2 0060 4.0060 0.0060 0 0060 0.00£0 0.000 


To get a tableau for this vertex we row-reduce S[; 1 2 4] 10 a permutation of 
the identity matrix augmented with a last row of zeros. 


+83 1 LOR 3.1 PIVOT S 


0 0060 1 000 9.0962 0 0060 ~1. 82-1 3 00E0 
© 000 5 206-18 1.0060 1.00E0 1.0060 1 1061 
1000 1.89618 -3.64E-1 0.0060 -2.73-1 1 0E0 
0. 00€0 4 00£0 7.27E°1 0.0060 ~5.45E°1 2 00£0 
+81 2 LOR 1 2 PiVOT 2 
0.000 1000 06.090 0.000 ~-o.182 3.000 
0.000 0.000 1.000 1.000 1.000 11.000 
1.000 0.000 -0.364 0.000 -0.273 1 000 
0.000 0.000 -0.364 0.000 1.270 14 900 


0.273 0.182 0 000 000 
4.000 1.000 1.000 00 
0.090 0.273 0.000 4.000 
0.909 1.270 0.000 28.000 


Thus the maximum value of 28 is au 
Since (x},.x5, 4) = (0,0, 11), this vertex is the intersection of the first and second 


constraints: 2x, + 3x) = 7,and 3x, + x = 17. The solution to the dual prob- 
em is ¥ = (.909, 1,27, 0), ¥” 


ned at the vertex (x,..x,) = (4,5). 


The technique of the last example works in general. For each negative com- 
ponent of the vectors add an artificial variable to the equation. Then maximize 
the negative sum of the artificial variables, 


6.3 The Simplex Algorithm 425 


EXAMPLE 6.19 Find a vertex of the set 


Solution The linear programming problem is 


maximize 
subject to 


The array (6.13) is 


=1 0 0 -3 -Ij1 0 0 


0-1 0 -1 -1/0 10 
00 =I =1 =4/0 0 1 
HW 1 Oo oho: oill To) 


Pee 
itis quite easy to obtain the tableau, 7; for such a vertex, Just add the sum of the 
upper rows to the bottom row, then change all the signs in the upper rows. 


poo 3 Dj=) 0 of 7 
pi (Oocted: Mis th) 0) it O'S 
oor 2 4) oO oO —I| II 
000 -S -6) 1 1 1) =23 
3019 1360 336) 1 of) eee) Fae te 4 eo 997 | aoe 
SEAS een arses aL see wines ashes Yasin) erin sai 
diets axes aac) ees soc] vier act sacs | 260 
VeEo y 0c 0EG > GE YH 0 OED) «RE 19 35618 1 ye 181 5 


Alshough the value of the objective function is still negative, we take it to be 
zero, since this machine carries only eighteen digits. 

For this final tableau the artificial variables are all zero; x = 3, y = 2 is the 
vertex, which, since x is the intersection of the second and third con- 


straints: x +y=Sandx+4y=11 0 « 


=x,= 


426 Linear Programming 


It is clear from the last example that the artificial-variable technique is a 
general method for finding a point in the set defined by AX < b, X > 0 (a general 
of problems that, via Proposition 6.13, can be shown to be coextensive with 
the set of all linear programming problems). In fact the simplex tableau can do 
more than find a single point of such a set. It yields a description of the entire set. 
This is illustrated in the next example for a situation where the geometry is easy to 
understand. 


EXAMPLE 6.20 Does the line 3x + Ty = 29 intersect the triangle with vertices 
(2,1), (4, 1), and (2, 3)? What about the line 3x + 7y = 22? 

A sketch (Figure 6.10) shows the situation. The first line does not intersect the 
triangle and the second line does. Let us see how the simplex tableau may be used 
to analyze this situation (and higher-dimensional ones where no sketch is possi- 
ble). 

Equations of the lines through the vertices of the triangle are x = 2, y = |, 
and x+y = 5. The triangle is thus the set defined by 


x 22 
1 


The line 3x + 7y = const, intersects the triangle if and only if the set of points 
defined by the system 


3y + Ty = const. 3x+ Ty < cons, 

x or | =3x— 7) < —const 
y <-2 
x+y <-l 
<3 


FIGURE 6.10 


63 The Simplex Algorithm 427 


is nonemply. Since x > 2, y > 1, the positivity conditions x, y > 0 are automatic. 
Were this not the case, we would have to break the variables up into positive and 
negative parts (x =.x* — x>,y = y* — )~) or move the origin of the coordinate 
system. 

To locate a point of the set we introduce one artificial variable for each nega- 
live term on the right-hand side and maximize the negative of their sum, The 
constant below is either 29 or 22. 


maximize z = —xf — xf — xf 
subject to 3x+Iy< const. 
—3x — 7) < —const, 
ser) 
-~y<-1 
kt ys 5 
‘The array (6.13) for this system is 
0 0 0 Zi 710000 
-1 0 0 -3 =7/0 1 0 0 0 | —const 
o -! 0 -! 000100 -2 
0 0 -1 0-1/0 0010) -1 
OieO/ 080) ale SION OO) Oia: 5 
t 4 
i} 1 1 oO 000000 0 


and a vertex is given by (xf,.xf..4%) = (const, 2, Is (49) = (0,0), (Xf. x5 84, 
x44) = (const., 0, 0,0, 5). To obtain a tableau for this vertex we add rows 2, 
3, and 4 to the bottom row and then change their sign 


000 3 7/1 0 0 0 0) — const. 

100 3 7/0 -1 0 00) cons 

OV hot i OPO." OF =i) Soo 2 
Telos Oo 1}O! 10 0) <i 0 1 

O00) i Uo) 0 0 0) 4 5 

000 -4 -slo 1 tf 1 0] (3 +const) 


Let r29 and raz be the array Twith the constant equal to 29 and 22, respectively. 
If we apply the simplex algorithm to 72g, the last row of the final tableau ix 


(max 129)[6:) 
0 0 1 S718 0 O75 1 1 SE18 1.7 0.5 


Taking the fourth and ninth components to be zero, we see that the maximum 


428 Linear Programming 


value of the objective function is —x// — xy — x% = —5 <0. Thus the set of 
points is empty (Figure 6.10). For r22, however, we have 


(FeMAX 122) (6; 1 
Baw (ay 9 4 > 10 (ono 


and since —\/' — x// — xi = 0, we have located point of the set. Dropping the 
columns for the artificial variables to give a somewhat more manageable display, 
we haye 


1 0E0 0,0€0 | 0 0£0 0 060 1 0€0 0.060 0.0&Q | 2 0£0 
0 00 0 060! 1 0€0 1060 0 OO 0 0£O 0 0£0 | 0 0£0 


‘Thus (x,») =(2,2.3), the point at which the line 3x + 7y = 22 cuts the line 
x = 2 (because the slack variable x = 0). But what of the other points of inter- 
section? 

Recall that if the last row of the pivot column contains a negative number, 
then as we move to the next vertex the value of the objective function increases. If 
this entry is positive, however, the value will decrease, and if it is zero, then 
moving to the next vertex will not change the value of the objective function. The 
zero entry in the last row and fifth column above indicates that there is another 
vertex at which the value of —xj’ — xf — x¥ is zero, Pivoting, we obtain 


FUL 4 SittoFty 4 $0) 


ysa ta 
6 44F65 8 LOS » Pivor F 
0060 000 | 2 SE-1 0.060 43619 1 080 7 sey) 7564 
0060 0.060 | 1 00 1.060 0.060 0060 0.060 0.080 
1060 0 9€0 | 2584 0 0£0 VTE 18 3 OE Ne 1760 «3 360 
oc ©) 060 | 2 se) 0060 42818 30618 -7 561 + 760 
doco 0.060 | 2561 0.060 1060 3.0618 1 760 |1 3¢0 
0060 9 0f0 | 1 060 1.060 0 0600 0&0 0 0€0 0 060 
So (x,y) = (3.3, 1.7) is also a point of the set. It is the point at which the line 


3x + Ty = 22 cuts the line x + y = (because the slack variable x/, = 0). Since 
the set of points at which the maximum is attained is convex, the entire edge 
between this vertex and the previous vertex has xj + xs + x4 =0, and so the 
corresponding values of x and y 


[Jee-ofs}uf3} ose 


6.3 The Simplex Algorithm 429 


lie on the intersection of the line and the triangle (alternately, the intersection of 
a triangle and a line is convex). 


EXERCISES 6.3 


In exercises | through 10 a standard maximum problem 1s given. Solve the problem with 
the simplex algorithm. Give the solution of the dual problem as well. 


1 Maximize 2 == 18x, + 19x, 
subject to subject 10 2x, + 2x. < 18 
3x, + Tey < 39 
4x, + 10x, < $5 

xu Ay ZO 

3. Maximize 4. Maximize == 12x, 
subject to subject 10 6x, + 7x, < 48 

In ba 

Sx, + 8x, S50 

xyhe 20 Yyexy 20 
5S. Maximize = 32x, + 37x 6 Maximize = = SOx, + 14x, 


subject to 8x, + Sx, < $5 subject to 
3x, + 2x < 24 
fix, + 8k, < 54 


X44 20 
7. Maximize z= 60x, + 45x 8 Maxim 
subject 0 6x, + 3xy = subject t0 
Sx, + 4x, 
10x, + 5a) < 3x, + Thy S29 
Nyt 20 Xpty 20 
9% Maximize 120K, + 140%, 10, Maximize 2 = 46x, + 75xy + 87x, 
subject to 4x, + 10x, + 7x, < 36 subject to 10x, + 2x +x) < a2 
9x, + 6x, + 10x, < 54 4x, + Akg + 2x) S52 
xy + 2g +54 16 2x, + Ty + 9X4 <0 
8x, + 4x, + dy, < 40 Ax, + Axy + 5Ny < 82 
Ky Ap ty 0 Ny tw dy 20 


(Computer assignment) In exercises 11 through 15 use the function MAN to solve the 
standard maximum problem. Give the solution to the dual problem as well 


11, Maximize 2 = 162x, + 150%y + 128x5 
subject to 6x, + 8xy + 7xy < 126 
x, + Bxy + 9x, < 155 


10x, + 7x, + 2xy < 138 
9x, + Tky + Sty < 132 
Tx, + 2k, + 5X5 < 84 “ 


Xp ky 20 


430. Linear Programming 


12. Maximize = = 84x, + 34, + 32x, 
subject to. 7x, + 2; +4 < 56 
6x, + ay + 2k, < 45 
9x, + Sky +4 < 100 
3x, + ty + Txy < 108 
10x, + 10x, + Sey < 163 
Ky Xp ty 20 
13, Maximize > = 194N, + 14x, + 150, 
subject 10 6x, + 3xy + 3Xy < 74 
lx, + 2r, + Ox, = 90 
Way + 10% + 10x, < 210 
2x, + 6x, + Sey < 121 
3x) 4 xy $ 2x, S43 
Ky Xpty > 0 
1. Maximize 2 = 39%, + SOxy 
subject 10 x, + Ixy + 8xy < 29 
3x, + Sty + ky S 46 
OX) + Tay + 5x, < 77 
2x, + Gry + 1x, = 52 
Xy + 4x, + 6x, < 30 
ihe X20 
15, Maximize 2 = (62x, + 65x, 
subject to 10x, + Ny + 2x) 
Bx, + Smy try S90 
By, + 3xy + 4 < 51 
By + He, + Ixy < 16 
9x, + The + Ixy S106 
XyXp ky 20 


For exercisex 16 through 22 find all the vertices of the set of points (a) by using the 
technique of artificial variables to find one vertex of the set, (b) by pivoting on columns 
with Zero in the last row to move along edges to other vertices while maintaining the 
autificlal Variables at zero. All sets are assumed to be in the first quadrant of R? (i.c., 
xy 20) 

16. 
17. 
1 -e ty she tay 
WW. Wey Sdn $y 
WW =v ty shan py: > 
2 WHY SO xt WeRyty og 
2 Ww Hy S72 —y D9 


23, Letw be a feasible pomt of the standard maximum problem (6.1) and o” = b — de. 
Let the index vector J give the nonzero component of (v.c') and set p(T) = 
(a,(7).6 = Ax,(T))[I}. where », is the constraint fat for v. Show that p(T) is the flat of 
solutions of (4 |)fs/ Ie = 6. 


64 Sociobiology, Game Theory, and Evolution 431 


Hint: Let g(7) be the solution flat and define 


aT) mT) py = 
[ Tle) abet nea ts) 


the other components being zero. Then Az,(T) + 7(T) = b, and one may assume 
(= (0), 75(0)) = (v, v'). Show that there is an ¢ > 0 such that (x,(7), 2,(7)) is feasible 
for iT <e. 


24. (a) Show from the definition of dy. dyin (6.12) that 
X = (const) — a | 


{b) Suppose that an entire column of (4,|4,) in 6.12 is negative. Show that the 
corresponding variable may be increased indefinitely and the point will still be feasi- 
ble. At the same time c7X will increase indefinitely, Thus 4 maximum will not exist 


25. Discuss the problem of Example 6.20 for the case where the line is 

@ 3e4+7=13 (b) 3x4 Ty =27 © 3e+ I= 19 
26. (Computer assignment) Use the simplex algorithm to verify the answer given for 
Example 6.3 
27. (Computer assignment) Use the simplex algorithm to verify the answer given for 
sample 6.5 


6.4* Sociobiology, Game Theory, and Evolution 


CASTE IN THE SOCIAL INSECTS 


The social insects (ants, termites, bees, and so on) have always posed a special 
problem for evolutionary biology. Normally we say that the process of natural 
selection selects those traits that increase an individual's offspring, that spread an 
individual's genes through the population. If this is strictly the case, then how do 
sterile workers evolve? The only reasonable answer is that for such insect colonies, 
natural selection must operate on the colony as if it were, in some sense, a single 
individual. This answer was originally given by Charles Darwin, but a convincing 
mechanism for colony-level selection was not proposed until the 1960s (by W. D. 
Hamilton). 

Granted that natural selection operates on an ant colony as a whole, what is 
the evolutionary advantage of castes? E. O. Wilson has used linear programming 
techniques to study this question. 

In the simplest analysis, if we take evolution to be “survival of the fitte 
then species should evolve in such a way as to maximize their “fitness function’ 
subject to the constraints of their environment 

Wilson applies this idea to the social insects in the following way. He takes 
“fitness” to be success in reproduction. An ant colony can be thought of as a single 


432 Linear Programming 


biological unit, Therefore the reproductive success (= fitness) of a colony is the 
number of new colonies it founds in a given period of time, 

‘An ant colony typically contains a queen and a number of asexual workers. 
which may come in various types (soldiers, foragers, nurses, and so on) or castes. 
To found a new coiony, a new queen and a number of males are hatched and sent 
out, Wilson asks; Why do different ant species have different proportions of the 
various castes in their colonies? That is, given an environment, what ratios of 
the various castes will maximize the production of virgin queens? He analyzes the 
question with linear programming techniques. 

To actually write down a problem, however, he switches to a dual problem: 
Minimize the energy cost (= number of workers maintained) while maintaining a 
given level of virgin queen production. The colony is subject to the constraints of 
maintaining enough workers to meet the contingencies of the given environment 
(scarce food, cold weather, anteaters, and so on), 

The cost of not meeting a particular contingency is measured in terms of the 
expected lost production of virgin queens per occurrence of the contingency times 
the expected number of occurrences of the contingency in a given period of time. 

Assume two castes and two contingencies. The probability that contingency | 
is not handled successfully is 


C= gr ~ 9 


where q, is the probability that a worker of caste jis able to cope with contingency 
i when the worker encounters the contingency (hence 1 ~ q, is the probability of 
failure), His the total weight of workers of caste /in an average colony, and a, is 
the average number of contacts a Worker of caste / can be expected to have with 
contingency i during the existence of the contingency divided by 1, 

Similarly, the probability that an occurrence of contingency 2 is not handled 
successfully is 


(= gay" = qua 


The corresponding costs of not meeting the contingencies are 


KX guy 


HCL — gyshrnl, ky Xy(l — gay) 


a 


where k, is the frequency of contingency / for the given period of time and X, is 
the average cost per failure to meet contingency i. 
IC F, is the highest tolerable cost due to contingency 2, then the constraints are 


AX = gail — gyi < F 
kek (1 — goa < Fy 


Since the terms in parentheses are less than I, we must have F, <k, X, or the 


64 Sociobiology, Game Theory, and Evolution 433 


constraint cannot be active. Although these are nonlinear equations, the bound- 
ary curves of the feasible set they define are, in fact, straight lines. The only 
variables are W, and Wand, taking logarithms, we obtain the standard minimum 
problem 


Minimize 
Subject to 


[ayy In = gy) + [aye In = gya)] Ye > =n( uy 


[egy In 1 = gay) + [ag In (1 = gy) Wy > =I 


W,, Wy > 0 


Since In a <0 for 0 <a <1, the logarithms in (6.22) are all negative and so the 
constant terms are all positive 

The feasible set will look something like Figure 6,11, 

If we write the system (6.22) in matrix form: 


minimize v=[1 1, WIP 
" 
subject io [2 S11 [5] 
ay dy IEW) = Ley 


Then a,, is a measure of the effectiveness of the ith caste in dealing with contin 
geney (it increases with both a,, and q,,). Let us number the castes so that caste / 
is specializing in contingency i. Since caste 1 is specializing in contingency | 

that is, is better at contingency 1 than contingency 2, ay, >dyy and so 


FIGURE 6.11 


434° Linear Programming 


FIGURE 6,12 


&/ay, <e4/dyy. This means that the curve for contingency 1 meets the H, axis at 
less than 45°, Similarly, the curve for contingency 2 meets the W, axis at less than 
45°. Since b = [1 Jf, the hyperplanes u = const. meet the axes at 45* and hence 
(Figure 6.11) the minimum occurs at 

Wilson works with Figure 6.11 to derive predictions about how caste ratios 
can change over evolutionary time, We will give one example. 


EXAMPLE 6.21 Suppose that caste | becomes more efficient at its task of coping 
with contingency 1 while all else remains constant. This means that a,, increases 
While dy dy, doy, Cy Cg, and b remain fixed. Suppose that aj, > ay, is the new 
coefficient. Then the equilibrium will be shifted to the vertex «’ in Figure 6.12. 

Thus the colony will be able to maintain the same level of virgin queen 
production with fewer workers (b?w’ < bTw), and the relative proportions of the 
two castes will change. But the more efficient caste has decreased in relative size 
and the other caste has increased, 

This is precisely the opposite of what would be expected if selection operated 
at the level of the individual rather than the colony. 

Thus the theory makes a clear prediction that could conceivably be used to 
test the hypothesis that selection takes place at the colony level. If selection takes 
place at the colony level and a caste increases in efficiency, then the caste ratios 
will shift one way. If, on the other hand, selection takes place on the individual 
level, then the ratios will shift the other Way. 


GAME THEORY 


Let us begin with the children’s game “rock-scissors-paper.” This is a game for 
wo players. Ata signal, each player shows his right hand either flat (paper), with 


64 Sociobiology, Game Theory, and Evolution 435 


TABLE 6.1 
Player I 
| yea] os 
eal tong 
1] 0 0 | -10 | 
Player!) 2) -10) 0 | 10 


two fingers extended (scissors), or clenched into a fist (rock). The winner is deter- 
mined by the scheme: paper covers rock, rock breaks scissors, scissors cut paper. 

The winner usually inflicts some minor physical punishment upon the loser, 
but let us move the example into the realm of economic theory by assuming that 
the loser pays the winner some amount, say a dime, Then the various possibilities 
are summarized in Table 6.1, where 1 = rock, 2 = scissors, 3 = paper. 

The entries in Table 6.1 are the amount that player II pays to player I. A 
negative amount indicates that player I pays player II 

‘The game of rock-scissors-paper is an example of a swvo-person, zero-sum 
matrix game. It is a zero-sum game because What one player loses, the other wins, 

Table 6.1 is called the payoff matrix of the game. 

‘The question to be answered in such a game is: What strategy will maximize a 
yer’s Winnings, or at least minimize the losses? 

If the first player always shows a fist, say, presumably the second player will 
figure this out eventually and always choose paper. If he is not to lose in the long, 
run, then, a player must vary his choices and do so in a way that his opponent 
cannot predict. If, for example, one player decided to cycle through the pattern 
rock-rock-paper-scissors, presumably his opponent would eventually notice the 
pattern and begin to play paper-paper-seissors-rock. Therefore, if a player de- 
cides that his best strategy is to play rock twice as often as paper and twice as often 
as scissors, he should devise a scheme for choosing each particular play at random 
but in such a way that in the long run the proportions of the time that each play is 
chosen comes out to be rock-rock-scissors-paper. This could be done, for exam- 
ple, by building a spinner on the pattern of Figure 6.13 and making each play by 
spinning the pointer 

Presumably even with this scheme the opponent would eventually discover 
the proportions with which the plays were being made. How well can a player do, 
then, assuming that his opponent has figured out his strategy in the sense that he 
knows how frequently each play will be made in the long run but cannot tell 
which play will be made at any particular time. 

Here is the general set-up. A matrix A, the payoff marrix, is given. It need not 
be squate. For example, the first player might be restricted to scissors and paper 
but win the ties. The i entry of the payoff matrix is the payof'to player I (the row 
player) when player I makes the play 7 and player II (the column player) makes 
the play /. 


436 Linear Programming 


FIGURE 6.13 


A strategy for a player is a plan to make each play a fixed proportion of the 
time in the long run. A strategy then can be represented by a vector p, where pli 
is the fraction of the time that the player will make the ith play in the long run. 

For example, if a player wishes to play the proportions of Figure 6.13, then 
his strategy is given by the vector p = (J, 4.4) 

Notice that the components of such a vector are always nonnegative and sum 
to one. 


Derintrion 6.9 

1. A probability vector p is a vector p > 0 whose components sum to 1 

2. Let 4, the payoff matrix for a game, be m-by-, Then a strategy for player I 
is a probability vector with m components. A strategy for player Iisa probability 
vector with n components, 

3.__A pure strategy is a probability vector with one component equal to 1. (It 
follows that the other components are zero.) 

4. The expected payoff trom strategies p and q is 


E(p.q) = pTAq 


The expected payolf is the average payoff, each play, to player I in the long run. 
This is because the probability that player I will make the ith play is pli] and the 
probability that player II will make the /th play is g[ /], hence the probability that 
the i;/th payoff will be made is p{i gl /}. This means that in the long run the payoff 


Alix] is made the fraction p[i yl /] of the time. It follows that the average payoff 
will be E 


E(p.g) = > Ali: jplélgl/) = pTAG 


EXAMPLE 6.22 For the game of rock-scissors-paper the payoff matrix is 


64 Sociobiology. Game Theory, and Evolution 437 


0 10 -10 
A=|-10 0 10 
10-10 0 


If player I chooses to play rock all the time and player I chooses to play 
scissors all the time, then we have the pure strategies p = (1.0.0). ¢ = (0. 1, 0), 
and the expected payofl is 


0 10 -10)/0 
E(p.g=[1 0 O}}-10 0 10] f1] =10 
10 -10 — ollo 
Player I can expect to win every game. 


If, on the other hand, player II chooses to make each possible play with equal 
frequency, then y = (}, 4.4) and 


0 10 -10)f1 
E(p.g)=[1 0 Of=10 0 0] 1}, =0 
10-10 onl 


and in the long run the players should break even, 


Now let us analyze the situation from player II's point of view, If player Il 
fixes on a strategy q, then player I will figure it out (this is a basic assumption in 
game theory). Thus, player I will adjust his strategy, p, to the known strategy 4. 
Thus player I will pick a strategy py (the subscript indicates that player II's point 
of view is being discussed) such that 


piAg = max p™Aq (fixed q) 
Since player II understands this, he must pick the particular g = qo such that 
player I: pg, = min max p™Aq 


his is what the players should do, provided that p, and q, exist. In fact they do 
exist, as we shall see below. 
Player I may go through the same line of reasoning to arrive at the equation 


player 1: pTAq, = max min pTAq 


Now is it possible for each player to settle on a fixed strategy. or will they have to 


438. Linear Programming 


1» then they can settle on a fixed strat- 
need not be unique) they can settle on a 


and g 
and q's 


keep shifting about? If p, = 
egy. More generally, (the p,’s 
fixed strategy if 


max min pq = min max pTAq (623) 
4 oor : 
Equation (6.23) is true and is known as the minimax theorem. One part of it 
is easy. Since, for any p,q, 


min p™Aq < pTAq < max pTAg 


we do have 


max min pTAg < min max p'Aq 


The opposite inequality takes more work. The trick is to set up a closely 
related linear programming problem —a problem that can be solved to find the 
‘optimal strategies of the game given by A. 


Pnoposttion 6.15 Let A be the payol? matrix of a two-person zero-sum game. 
Let uv, w be solutions of the pair of dual linear programming problems 


maximize z=c?X — [minimize u = bt) 
subject to AN <b subject to Ay > c (6.24) 
x>0 y>o0 
where b=(Iy.... 1, ¢=(.-.. 1), Let 0 = btw = ctv. Then p* =v +4, 


q* = +0 are optimal strategies for the game and the expected payot is 
E(pt.q’) =0-'. 


Proof Let v and w be solutions; then bTw = ctw by Proposition 6.12. Now 
bTw = sw id clu = +0. Since v,w >0, we have @=0 if and only if 


v =w =0, which violates ATw > c. Thus @ > Oand p* =v =0.g* =w +O are 
probability vectors, 

Since Av <b, if p is any probability vector then [Proposition 6.4(a)] 
ptdv < pth = |. Similarly wTAg > | for any probability vector g. Dividing by 4, 


we have 
TAG? <> < pTAq 


for all probability vectors p,q. Now the pattern of zero components in w and w 
(Proposition 6.12) implies 


0 = bw = wth = whdu = (ATw)e = ely = 0 


64 Sociobiology, Game Theory. and Evolution 439 
Dividing by 0? gives 0! = p*T4q*. Thus, 


piTAg* <p*™Aq shows p*Aq* = min p*TAq 
soiree a maa 
so i 
p*TAq* = min p*TAq < max min p™q < min max pTAq 
“ , oo. 


< max pTAq* = p*TAq* 
* 


so the minimax theorem holds and we can take 


Fi 


There is a subtle problem with Proposition 6,15. The proposition states that if 
the dual problems have a solution &, w. then p* =v + @-and q* = w + @ are op\ 
mal strategies. How do we know that the pair of problems has solutions? In fuet, 
they need not have solutions (exercise 14), but we can always shift to a pair that 
does have solutions. 

Since b > 0, the maximum problem has the vertex X = 0 (Proposition 6.14), 
Thus the maximum problem, and hence the minimum problem also, will fail to 
have a solution only if eX is unbounded above. Now if w’ is any feasible point of 
the minimum problem, then by Proposition 6.12, c™'X < fw" and hei 
bounded above. Thus, the problem will have solutions if the feasible set of the 
minimum problem can be shown to be nonempty. If A > 0 (every entry strictly 
larger than zero), we can pick a w’ with sufficiently huge components to guarantee 
Aw’ >c,w’ >0. 

The trick is the observation that A and A + a, a any scalar, define the same 
game in the sense that 


pA +a)q =ptAq +0 


so that the fixed constant a is added to all the expected payoffs. This means that 
p*.q* will be the same for both A and A + a. Thus to apply Proposition 6.15 we 
first add a large enough « to make all entries of A + a positive. 


EXAMPLE 6.23. Consider the game of rock-scissors-paper. The payoll matrix is 


0 10 -10 
A=|-10 0 10 
10-10 0. 


440._Linear Programming 


Adding 11 to each component gives 


No) ad 
A+i=| 1 th 21 
PAU TW) EE 


Since b = (1, 1, 1) > 0, a starting tableau is 


oa yt oot 
1 uu 2ijo 1 o}t 
Pe aexy a joo tt 
ee SS Ih 
Sess imme 
and a final tableau is 
+FeMAX 
626-19 1060 4.9619) 436-2 106-2 -2.96-2) 3 08-2 
2.5619 6 0619 1.060 |-2.96-2 4.362 1.0€-2|3 06-2 
1060 “2.1619 8 7e19| 106-2 2.362 4.3€-2|3 06-2 
13618 453-19 6 7E19| 3.062 3.062 3.06219 182 


Thus the optimal strategies g* and p* are 


FOF(S 1 2:7]0F (4.7) 
093 0.83 0.33 

SPLE(A:d 8 O)<F (4,7) 
033 0.33 0.39 


And the expected payoff is, remembering to subtract a = 11, 


AY8PS x TL; 13) 2 xO 
2 8E-17 


Thus, as one would expect, the optimal strategies are for each player to 
choose rock, scissors, and paper with equal frequency and, in the long run, both 
will break even. 


EXAMPLE 6.24 Suppose the first player in rock-scissors-paper is allowed to 
choose only rock and paper, say, but by way of compensation is allowed to win the 
lies. What are the best strategies and expected payoff now? 

The new payoff matrix is 


10 10-10 
4=[19 -10 10 


64 Sociobiology, Game Theory, and Evolution 441 


and, adding 11 to A, starting and final tableaus for the corresponding maximum 
problem are 


r 
a oat tf oy 
2a m2 21 |) 0 ct 1 
1 1-1 be ote 
4FeMAX T 
9561 1.060 0060 | 4662 2563) 4.562 
9561 1 1060 |-2.363 4.86-2| 4 5e-2 
S16. 8 7e7 761 52 9 1e2 
The optimal vectors q* and p* are 
+0-0.F{1 2:6)-F13;6) 
0 05 05 
sPoP(3.4 5)-FI8.6) 
os 085 


and the expected payoll is 


Hee xT[1 2:1 2 3)+ xO 


Thus. ‘ategy is to play rock and paper w 
frequency, but the second player's optimal strategy is to play scissor 
with equal frequency and avoid playing rock altogether (this lessens the probabil- 
ity of ties). In the long run both players will come out even, 


From Proposition 6.15 it appears that the theory of two-person zero-sum 
games is a subset of the set of linear programming problems in the sense that each 
game has a corresponding program (an infinite number in fact) that solves it, In 
fact, the theories are coextensive in this sense, because every linear programming 
problem may be rephrased as a two-person, zero-sum game, Thus any general 
method for solving such games will also solve linear programming problems. 

Game theory is not used in solving the everyday problems of industry the way 
linear programming is, but it has been very influential intellectually, especially in 
the field of economics. In fact, the term “zero-sum game” has become a stock 
phrase of newspaper editorial writers. 


EVOLUTIONARILY STABLE STRATEGIES 


J. Maynard Smith has adapted game theory to the study of the evolution of 
behavior. His idea is to model contests between animals in game-theoretic ternis. 
Different individuals of the same species often engage in contests over posses- 


442 Linear Programming 


sion of various kinds of “resources,” such as food, territory. and mates. These 
contests are not invariably duels to the death. A great deal of bluffing can be 
involved, and the winner is often simply the one that yells the loudest. Gorillas, 
for example, scream and beat their fists upon their chests but in most cases avoid 
actual combat, 

How does such behavior increase an individual's Darwinian fitness? By “Dar- 
Winian fitness” we simply mean reproductive success, ensuring the survival of 
ones’ genes in the population. 

When animals confront each other they communicate their intent by behav- 
ioral conventions: holding the body in a certain position, showing teeth in a snarl, 
beating the chest, and so on, These displays will be referred to as conventional 
fighting. Unconventional fighting, on the other hand, is actual fighting. Which 
gives a better chance of reproductive success, conventional or unconventional 
fighting? Maynard Smith models the question in terms of a payoff matrix. We will 
discuss two of his examples. 


Exam (The Game of Hawk and Dove) Suppose that there are two types 
of individuals in a population, hawks and doves, The hawk strategy in a contest is 
{o fight unconventionally, to escalate the fighting until victory or serious injury 
results, The dove strategy is to fight conventionally and to run if the opponent 
escalates the fighting. 

We need some numbers to represent the various possible payoffs. Let us 
measure the payoff in “Darwinian fitness units” that somehow measure increase 
in reproductive success — the number of offspring expected to survive to repro- 
ductive age, perhaps. Consider a fixed resource. Suppose that possession of the 
resource is worth, say, a > 0 fitness units. Suppose that serious injury is worth —b 
(b > 0) fitness units, and suppose that a long, unresolved contest (dove versus 
dove) is worth —c (¢ >0) fitness units with b > e. 

Now suppose that two individuals enter a contest over the resource. Two pure 
strategies are available: 1 = hawk and 2 = dove. The payolf matrix is 


(6.25) 


wis 


This is calculated as follows. In a hawk-hawk contest we assume that player 1 
has a 50 percent chance of winning. In hawk-hawk contests, then, the expected 
payoll to player | would be Ja — 4b. In a hawk-dove contest the hawk (player 1) 
would win and the payoff would be a. Ina dove-hawk contest the dove (player 1) 
would run and thus get 0. In a dove-dove contest we again assume that player I 
has a 50 percent chance of winning (1e., of outlasting player II), so the gain to 
player I is a/2 minus the cost of the long contest. « 


Now if we have a payoff matrix such as (6.25), game theory suggests that we 
proceed to calculate the optimal strategy for the players. But unless there is some 


64 Sociobiology, Game Theory, and Evolution 443 


sort of vital principle directing organisms to optimize, it is difficult to see what 
meaning such a computation would have. We are assuming that player I's behav- 
ior, his “chosen strategy” is coded into his genes, and we wish to know what sort of 
behavior will persist over evolutionary time — that is, will not die out, 

Suppose that the population consists entirely of hawks. To be definite let us 
take ¢ = I, a = 2, b =4, Then the payoff’ matrix is 


[ol 


Since there are only hawks in the population, the expected payoff from each 
contest is —1. Now suppose that, through mutation, a dove strategist arises. To 
begin with there are many more hawks than doves, hence most hawks will fight 
with hawks and most doves will also fight with hawks, So, at least when the dove 
population is small, the expected payolT to a dove strategist is close to 0 > —1 
This means that the dove population will not die out but will persist in the popula 
tion, This analysis holds as tong as b >a, Pure hawk is not an evolutionarily 
stable strategy in a population. (Neither is pure dove: see exercise 15.) 
The payoff matrices that arise in this context are all square. 


DerixtTion6.10. Let A be a square payol’ matrix, The probability vector p is an 
evolutionarily stable strategy it 


1. E(p.p) > E(q.p) for all strategies 4. 
2 If E(p.p) = E(q.p). then E(p, 4) > E(q.g) 


To see what this means, suppose that the population consists entirely of individu- 
als using strategy p and that a mutation arises using strategy g. Suppose that the 
fraction of the population using p isa sand the fraction using g is A(a + f= 1). 10 
the Darwinian fitness of the individuals is F, before a series of contests, then afler 
the series of contests the fitness for the population using strategies p and q is 


Fy + aE (p.p) + BE(p.q) 
Fy + oF (y.p) + BEG.) 


This is because the fraction of time that an individual meets a p strategist is a and 
the fraction of the time he meets ag strategist is A. For evolutionary stability we 
want Fp) > Flg): Assuming « much larger than fi, we get the definition of 
evolutionarily stable strategy 

Here are two simple propositions on evolutionarily stable strategies, 


Prorostti0N 6.16 If the diagonal entry Afi; i] is the strictly largest entry in its 
column, then pure strategy 1 is evolutionarily stable, 


Provostiion 6.17 Suppose that p is an evolutionarily stable strategy for A. If 
pli] > 0, then the ith component of Ap is maximal. 


444° Linear Programming 


Proof Suppose that w = Ap has maximal component w[k] =a with k # i. Let q 
be the &th pure strategy. Then y # p and E(y, p) = q"Ap = q?w = wlk] = m. But 


E(p,p) = pp 
= pili] + SPU) 
in 
<< plilwli) + mS’ pl 
A 
<m& plij=m=Eq.p)  ifpliJ>0 


EXAMPLE 6.26 Let 


1-1 -1 
Oo 2 -2 
=i! alae?! 


Then all three pure strategies are evolutionarily stable by Proposition 6.16. A 
population chancing to start off with any of them would persist with them be- 
cause, although the others are also stable, once one is established, the others 
cannot arise, 


The condition of Proposition 6.17 is necessary for p to be evolutionarily stable 
but it is not sufficient. 


EXAMPLE 6.27 (The Game of Hawk, Dove, and Bourgeois) A third type of strat- 
egy that appears to be important in nature is termed bourgeois by Maynard 
Smith. If you “own” the resource (nest, territory. mate), play hawk: otherwise, 
play dove. 

Assume that in a given contest each contestant has an equal probability of 
being the owner; for example, the first frog that arrives at the lily pad is the 
“owner.” Then the expected payoff in a hawk-bourgeois contest is the average of 
the hawk-hawk and hawk-dove payolls: 


64 Sociobiology, Game Theory, and Evolution 445 


and as long as b >a — that is, the penalty for serious injury is greater than the 
gain from victory; Proposition 6.16 shows that the bourgeois strategy is evolution 
arily stable. = 


Notice that in the last example “ownership” is used to settle the contests. 
Ownership has nothing to do with strength, ferocity, or intelligence. It is the 
evolutionary equivalent of settling an argument by flipping a coin. 


EXERCISES 64° 


Caste in the Social Insects 
1. Show that if there are two castes but only one contingency, then the caste least 
effective in dealing with the contingency will die out. 


2. Suppose there are two castes but more than two contingencies. Show that the caste 


ratios will be entirely determined by the tWo most important contingencies 

3. Suppose that contingency 2 increases in frequency or importance. Suppose that caste 
Fis specialized for contingency i. Show that although contingeney I remains as frequent 
and important as ever, caste 1 can die out 


4. Show that if there is one caste and two contingencies, then it is to the species” 
advantage to evolve two castes, one specialized to each contingency, because in the latter 
case the total weight of workers can be less. 

5. Assume two castes and two contingencies. Suppose that the castes are relatively 
unspecialized (Le., a), > ay but not much greater, and similarly for the second curve), 
Show that a relatively small long-term change in the frequency or importance of one of 
the contingencies can result in a large shift in caste ratios 

6. Assume two castes and two contingencies. Suppose that the castes are quite special- 
ized (¢.g., a,, quite a bit larger than a,, and similarly for the second curve). Show that a 
relatively Large long-term shift in the importance or frequency of one of the contingencies 
will produce little shift in the caste ratios 


Game Theory 
7. Show that the optimal strategies for a given game need not be unique. 
Hint: Examples 6.22 and 6.23. 


8. Let A be a payoll matrix, The entry Ali; /] 18 called a saddle point if itis the smallest 
entry in its row and the largest entry in its column. 


(a) Show that if @ = Ali;/] is a saddle point, then taking p* to be the ith pure 
strategy and q* to be the /th pure strategy gives « solution to the game 


Hint: By adding a 10.4, assume 4 > 0. Show w = p*/@..v = q*/0 give solutions 
0 the corresponding dual problems. 
(b) Show that all saddle points have the same value. 
Hint: E(p*.q*) is the same for all solution points p*-4°. 
9. Show that 


E(p*.q) < E(p*.q?) < E(p.g") 
for all p,q if and only if p* and g* are optimal veetors for player Land player Il 
Hint: The proof of Proposition 6.15. 


446 Linear Programming 


10, A matrix is skew-symmetric if AT = —A, Suppose that the payoff matrix is skew- 
symmetric 
(a) Show that E(p.g) = 
Hint; Since p'Aq is one by one, p'Aq = (pTAgy™ 
(b) Show that E(p*. g*) = 0. 
Hint: By exercise 9 and part (a) pTAq* < p*TAq* <p*TAq_ implies 
grTAp > q*T4p* > gTAp*, so (g*.p*) is also a solution. Then, since 
ECP, — E(q*,p"). E(p*.q") = 0. 
11, Alter the rock-seissors-paper payoffs (Example 6.22) as follows: rock-scissors, 30 
cents (0 rock; scissors-paper, 20 cents to seissors; paper-rock, 10 cents to paper. 
(a) Write the payof matrix and show that in the long run the players, if they play 
their optimal strategies, will break even 
Hint: Exercise 10. 
(b) (Computer assignment) Use the simplex algorithm to compute a pair of optimal 
strategies. 
12, (Computer assignment) Suppose that the rules of rock-seissors-paper (Example 
6.22) are altered so that player 1 wins 10 cents for all ties but pays 20 cents when he loses. 
All other payoffs remain the same. What are the optimal strategies and expected payoffs? 
13. (Computer assignment) Suppose that the rules of rock-scissors-paper (Example 
6,22) are altered so that player I gets 10 cents for rock-rock and paper-paper ties and 
player II gets 30 cents for scissors-scissors ties. All other payoffs remain the same. What are 
the optimal strategies and the expected payofl? 
14, (Computer assignment) Show that if is not added to the payolT matrix 4 of exercise 
11, then the dual linear programming problems have no solution 


E(q.p) for all probability vectors p,q. 


Evolutionarily Stable Strategies 
15. Prove Proposition 6.16. 


16. Show that the pure dove strategy is not evolutionarily stable for the hawk-dove 
game. 


17. Verity the payoff matrix of Example 6.27, 


18, Show that if, in the payoff matrix 4, Al /:/] > Alé;#}, then pure strategy / cannot be 
evolutionarily stable, 


Hint: Proposition 6.17, 


19, (a) Show that 4 and A + @ have the same evolutionarily stable strategies for any 


mats A and salar a, Given a 2by-2 matrix 4, add a 10 get A =[? 


(b) Show that if a <0 or 6 <e, then A has an evolutionarily stable strategy 
Hint: Proposition 6.16. 

(©) Assume a > 0, ¢ <ib. Ifa + b # ¢, show that (p. 1 — p) is an evolutionarily 

stable strategy where p = (b — o)/(u +b =e. 

(d) Continuing (c), show that if a+ b =o, then A has constant columns. 

ab 


20. Does the matrix 4 = | 
ab, 


nave evolutionarily stable strategies? 


21. Find an evoluti 


rily stable strategy for the hawk-dove game. 
Hint: Exercise 19. 


64 Sociobiology. Game Theory, and Evolution 447 


REFERENCES 


1. Wilson, E. O., The Insect Societies, Cambridge: Harvard University Press, 1971. 
Wilson, E. O., Sociobiology. Cambridge: Harvard University Press, 1975. 

3. Maynard Smith, J. “The Evolution of Behavior,” Scientific American, September 
1978, p. 176 

4. Maynard Smith, J., J. Theoretical Biology, 47 (1974), 209. 

5. Dawkins, Peter, The Selfish Gene. New York: Oxford University Press, 1976, 

6. Axelrod, R., 1d Hamilton, W. D., “The Evolution of Cooperation,” Science 211 
(1981), 1390-1396. 


References | and 2 are for further discussion of caste in the social insects. References 3, 4, 
$ are for the material on evolutionarily stable strategies. Pages 74-94 of reference 5 are 
relevant, and the mathematics is in reference 4. Reference 6 applies game theory, evolli- 
tionarily stable strategies, and computer simulations to investigate the evolution of coop- 
eration between unrelated individuals — for example, the fungus and alga that form a 
lichen. 


CHAPTER SEVEN 


Eigenvalues and 
Eigenvectors 


In Section 5.2 we discussed the problem of diagonalizing a symmetric matrix and 
defined the eigenvalues and eigenvectors of such matrices. 

In this chapter we consider the problem of diagonalizing a square, not neces- 
sarily symmetric, matrix. The approach is quite different from that used in the 
(reatment of symmetric matrices. In Section 5.2 the eigenvalues were defined as 
the diagonal entries that appear when the matrix is diagonalized. In this chapter 
the eigenvalues are defined as the roots of a certain polynomial, the characteristic 
polynomial of the matrix. With this definition the eigenvalues of a matrix are, in 
general, complex numbers. 

In order to define the characteristic polynomial, we need some facts about 
determinants. These are developed in Section 7.1. The eigenvalues and eigenvec- 
tors of a matrix are then defined in Section 7.2 

Three optional sections are given over to three different applications of the 
material of Section 7,2. Linear difference and differential equations are discussed 
in Sections 7.3* and 7.6%, respectively. One of the most interesting aspects here is 
the way in which complex numbers are used to solve problems that seem at first to 
involve only real numbers. 

In Section 7.4* eigenvectors are used to analyze congruences in three-dimen- 
sional space and obtain a geometric description of the nonsingular affine func- 
tions /: R¥ > R, 

Gerschgorin’s theorem, Section 7.5*. gives a useful estimate for the eigenval- 
tues of a matrix, 

The approach to eigenvalues and eigenvectors given in Section 7.2 provides 
theoretical insights but not a practical approach to the computation of eigenv: 
ues. Section 7.7 deseribes the QR algorithm for computing the eigenvalues of a 
general matrix. The section ends with some APL functions that are useful for 
computing eigenvalues and eigenvectors of general matrices. 


448 


7.1 Determinants 449 


7.1 Determinants 


Determinants are really part of the subject matter of multilinear algebra or tensor 
analysis. We need them, however, to develop the theory of the eigenvalues of a 
general (ie., nonsymmetric) matrix. We will develop only enough of the theory of 
determinants to enable us to calculate them and use them to discuss eigenvalues. 

Determinants are basically volumes — volumes with an algebraic sign at- 
tached, and we begin with some plane geometry to explain why one attaches an 
algebraic sign to a volume. In the plane, of course, the word is area rather than 
volume. 

Let p. g, r be any three points in R? and let A(p, g,r) be the area of the triangle 
with vertices p, q, and r [Figure 7.1(a)}, Ultimately we would like a formula for A 
in terms of the coordinates of the points p, q, and r. Areas are additive. To be 
precise, suppose we choose a point, call ito, in the interior of the triangle pgr. This 
point defines a division of pgr into three smaller triangles; opq, opr, and org. 

From Figure 7.1(b) we have the formula 


Mp. q.t) = ACO, p.g) + Alo, gu) + Alo Fp) 


This formula suggests taking 0 to be the origin and identifying the points 
p. qr with the vectors with their tails at the origin and their heads at p, q, andr, 
respectively. If we do this, then given two vectors v, Ww it is sufficient to find a 
formula for the area A(o, v, w) in order to compute A(p, q. 7) for any three points. 

There is a problem, however. The formula above works only when o is in the 
interior. Figure 7.1(c) shows what happens when ¢ is not in the interior of pqr 
[corresponding triangles in (b) and (c) are shaded similarly} Clearly the formula 
above no longer holds in this case. It would hold, however, if the area of triangle 
‘ogr were negative! 

Without further ado we make the convention 


itive if the circuit p > q + r— p is counterclockwise 


po 
AO.) Lrcatve be 


cull p + q —> r+ p is clockwise 


This convention makes the formula hold in all cases (Figure 7.2). 


p 


a 


(a) (b) «) 
FIGURE 7.1 


450 Eigenvalues and Eigenvectors 


i) (b) () 
FIGURE 7.2 


In fact with this convention we may reduce the calculation of area of an 
arbitrary polygon to the calculation of areas (with the proper sign) of the form 
A(o, u, 0). Consider, for example, the two polygons in Figure 7.3. The points py. Po» 
y4. and py as Well as 0 are the same in both Figure 7.3(a) and 7.3(b), but 
nt polygons have been formed by connecting the points in different order. 
By carefully counting up the areas involved, always with the proper sign, the 
reader will find that, with the obvious notation, 


A(O. flys Pa) + AO, Pras Pa) + ACO, Pas Py) 

+ AO, Pas Pa) + ACO Pos Py) 
ACP Pay Pes Pav Ps) = As Pry Pg) + ACO- Pry Ps) + ACO; Pr Pa) 
+ ACO Pye Pa) + MO. Pye Py) 


AP Pas Ps Par Ps) 


(The second formula is a bit tricky, but it works.) 

The process works for polygons with an arbitrary number of sides, and since a 
continuous curve can be approximated as closely as desired by a polygonal line, 
the process can be used to approximate arbitrary areas in the plane. By taking the 
Process to the limit, one obtains a special case of the theorem from the calculus of 
several variables known as Green's theorem. Such considerations also lead to the 
mechanical device called the planimeter, which can be used to measure the areas 
of an irregular shape by tracing the houndary. Nowadays we could also enter the 


Leh, £aol. 
Y OW 


FIGURE 73 


71 Determinants 451 


FIGURE 74 


coordinates of as many boundary points as desired directly into a computer by 
using an electrical device called a digitizer or graphics tablet. 

Let us return to the problem of computing A(o, u,) for a pair of vectors 4, 0 
in R? (Figure 7.4). The area of this triangle is half the area of the parallelogram 
determined by the vectors wand v. We will calculate the area of the parallelogram, 


DEFINITION 7.1 Let u, v be vectors in R®. The dererminant, det (u,v), of wand v 
is the area of the parallelogram determined by w and v, This area is positive if the 
circuit 0 > w+ v + 0 is counterclockwise and it is negative if the circuit is 
clockwise. 

The determinant function has four properties that allow us 10 compute it: 


1. det (u,v) = —det (vu) 
2. det (u,v) = det (u,v + Au), A any scalar 
3. det (Aue) =Adet (1,0), A any scalar 
If uw = (1,0) and v = (0, 1), then det (u,v) = 1 


The first property is directly from the definition, The second and third come 
from the plane geometry formula for the area of a parallelogram: 


A=bh 


In Figure 7.5(a) the formula A = bh is illustrated. In Figure 7,5(b) we see that 
neither 6 nor h nor the direction of the circuit changes when v is replaced by 
vu +Au. In Figure 7.5(c) we see that when w is replaced by Au 4 remains un- 


Me 
. a Yu nu ih 7 if 
vi Te fo eof GAs 
; - % 
a-ha puttin 


FIGURE 7.5 


452 Eigenvalues and Eigenvectors 


changed and b is replaced by |\\b. If\ > 0, it follows that property 3 holds. If 
\ <0, then the direction of the cireuit 0 > u — v — 0 is reversed, so property 3 
holds in this case also. 

To calculate the determinant of u and v we store them as the rows of a matrix, 
Then properties 1 through 3 can be interpreted as statements about how row- 
reduction operations change the determinant. Property I states that a row inter- 
change changes the sign of the determinant, Property 2 states that a pivot opera- 
tion does not effect the determinant, and property 3 states that multiplying a row 
by a scalar multiplies the determinant by the same scalar. This means that we can 
calculate determinants by row-reducing the matrix with rows u, v. 

If A is @ 2-by-2 matrix, we define |4| = det (4[1;}, 4[2;). 


EXaMPLe 7.1. Compute the determinants of the following matrices: 
((: ) ( 3) O ‘) 
eTyih NDT A\2ea 

Solution Row-reducing, we have 


(2,7),A = -2 


{fe ie 
1 I al={5 ‘ll property 2 with w= (1.3), u 


Lo 
= 15 1] (property 2 again) 
= (property 4) 
6 0 10 
tele |b o| (orpeny 3) 
10 
26:7 C1 3) 
lo {| Corner 3) 
=42 (property 4) 
3. 0 


2 
= ie Al (property 1) 


‘i | (property 3) 


}| (propeny 3) 


10 
i —6 (property 2 and property 4) 


21 Determinants 453 


In the example above we used two implications of property 1: 
2, det (u,v) = det (u + Av, v), A any scalar 
34 det (u,Av) = det (u,v), \ any scalar 


Property 3° holds, for example, because 


det (u, Av) = —det (Av, u) = —A det (v, u) = —\ (—det (w, v)) =A det (uw, v) 


We next show that this definition coincides with the definition of the determi- 
nant of a 2-by-2 matrix given in Chapter 2. 


Proposition 7.1 


ab 
id —b 
I i | meee 
Proof a #0. 
b 
a Al is, lat aes 
i dz“le a (property 3) 
ne 
a 
=a ‘ (property 2) 
lo d= 2 
a 
‘| 
saly%) || = : 
=a(a 2) hae (property 3) 
10 
= (ad — cb) lo ; (property 2) 
=ad ch 
The case a = 0 is left as an exercise. = 
Exampre 7.2. What is the area of the polygon with vertices py = (1, I. 


—1,3), in that order? 


Ps = (3,2). Py = (2.3), Py = (3.4) prs = 


Solution 


ACP, Po Px Par Ps) = AC; Pyy Pa) + ACO. Po» P's) + ACOs ss Py) 
+ ACO, ps ps) + ACO Pos Py) 


454 Eigenvalues and Eigenvectors 


o v 
FIGURE 7.6 FIGURE 7.7 


1p 4p 3 34 l=? H| 
|; AB Alsib Alelealallin ti 
M1) +5 + (1) + 13 + (-4)] 

= M18 — 6) 


‘The polygon is sketched in Figure 7.6. = 


We shall define the determinant of a square matrix by the four properties 
listed above for the determinant of two vectors in the plane. This shows us how to 
compute the determinant but leaves logical gaps. It does not prove that the func- 
tion called “determinant” actually exists, because the four conditions might be too 
restrictive, If the conditions are not too restrictive, they might not be restrictive 
enough, and there may be many different functions that satisfy them, We will not, 
however, pursue these questions further. 


Derinrtion 7.2. Let A be a square matrix. The determinant of A, written det (A) 
or |Aj: is the unique function satisfying conditions | through 4. 


1. Interchanging two rows changes the sign of the determinant, 


2, Adding a multiple of ong row to a different row does not change the determi- 
nant, In particular, pivoting on any entry of the matrix does not change the 
determinant, 


3. Multiplying a row by a scalar multiplies the determinant by that scalar. 
4, The determinant of the n-by-n identity matrix is 1. 


2.1 Determinants 485 


For n = 3 the three rows of the matrix, considered as vectors in R', define a 
volume called a parallelopiped (Figure 7.7). The volume of such a solid is the area 
of the base (defined by w and vin the figure) times the height, which is the length 
of the perpendicular dropped from w’ to the base. The volume is taken to be 
positive if the circuit around the triangle formed by the tips of the vectors v = 
All;}, v = A[2s], w = 4135] in that order is counterclockwise when viewed from the 
origin, Otherwise the volume is negative, It is not too difficult to see that the 
volume of such a parallelopiped satisfies properties 1 through 4 

We now develop some important properties of determinants. 


PROPOSITION 7.2 Let A and B be n by Then 


det (AB) = det (A) det (B) 


Proof First notice that the formula holds when is an elementary matrix: 

Ais a switch matrix: A is oblained from 1D n by interchanging two rows. 
Hence det (4) = —1. AB is B with two rows interchanged, hence det (AB) = 
—det (B). 

A is a multiplier matrix: Suppose A multiplies a row by A. Then det (4B) = 
A det (B) by property 3. 4 is obtained by multiplying a row of /D n by A, hence 
det (A) =X, 

A is a pivot matrix: By property 2, det (AB) = det (B). A is obtained by 
multiplying rows of 1D n by scalars and adding the result to other rows. Hence by 
property 2 (and 4), det (A) = 1. 

The result now follows if A isa product of elementary matrices: 


A=E,Ey* 


det (AB) = det (E 
= det (E,) det ( EB) 
= det (£,) det (E,) --- det (E,) det B 
= det (E\Ey --- B,) det B 
= det (A) det B 


If A is not invertible (see Proposition 3.5) then neither is AB. In this ease the 
determinants are zero. We defer the proof to Proposition 7.5. 


Propositions 7.3 Let A, C be square matrices. Then 


AB 
det | | = det (A) det (C) 
o¢ 


456 Eigenvalues and Eigenvectors 


A|B (ee) 
Ftel- olello! slots 
and pivot operations alone will reduce 
1/8 
0 til 
{o an identity matrix. Thus by Proposition 7.2 


telsditel* Ghislaine 


Now precisely the same row reduction that reduces 4 to echelon form may be 
used to reduce 
A\O 
+t 
OL 


det (A) = det 
0 


Proof 


to echelon form, so 


Similarly for 


Proposition 7.4 Let A be a square matrix, Then det (A) = det (A7). 


Proof Wf A =E,E,---E,, a product of elementary matrices, then AT = 
ETZEL, --+ ET, so in this case it will be sufficient to prove the proposition for 
elementary matrices. Switch and multiplier matrices are symmetric, so the ques- 
tion reduces to pivot matrices. 

We can factor a pivot matrix into even more elementary matrices. Let 
E(i;j,) be the matrix obtained by multiplying the ith row of [Dn by \ and 
adding it to the jth row. Then det (E(i,j.\)) = l, Eli.j,)" = EU. iA), and a 
pivot matrix is a product of E(i,j,)'s. If A is singular then so is A* (Proposition 
3.6) and both determinants are zero. See Proposition 75. = 


Proposition 7.5 Let A be a square matrix. A is invertible if and only if 
det (4) 4 0, and if 4 is invertible then 


7.1 Determinants 457 


det (42> 


let (A) 


In particular, det (P~'AP) = det (A) for any invertible matrix P. 


Proof A= » E,E, where E, is elementary and £ is the row echelon form 
of A. Since det (E,) 4 0 (see the proof of Proposition 7.2) and det (A) = det (Ey) 
det (E,) --- det (E,) det (E), we see that det (A) #0 if and only if det (E) 4 0. 
Ais invertible if and only if £ is an identity matrix, and E, being square and 
echelon, is an identity matrix if and only if it has no zero rows. So the proposition 
is proved once we establish that a matrix with a zero row has determinant zero. 

If A has a row of zeros, multiply the row by 0. Then by property 3, since A 
remains unchanged, 


det (A) = 0+ det (A) = 0 
If A>! exists, then 


det (A) det (A>!) = det (AA~¥) = det (1) = 
So 


det (A"!) = 


det (A) 


Then 
det (PAP) = det (P)-! det (A) det (P) = det (A) w 


Given a matrix A, let R be the special reflection matrix (se 
into (\jo|},0,0,,-..0), Then 


Hel) |B 
RA { 
0 Ie 


ell 
det (RA) = det 
ole 


det (R) det (A) = det [jJe|] det C = Jol) det (C) 
\jull det (C) 
det (R) 


and 


det (A) = 


The matrix Cis (n — 1) by (n — 1), so if det (R) is reasonably easy to com- 
pute, we can invoke mathematical induction for the computation of det (C) and 
use the formula to get det (A). 

The matrix R is a reflection in a one-dimensional subspace of R 


458 Eigenvalues and Eigenvectors 
Proposition 7.6 Let R be an n-by-n special reflection matrix: 


det (R) =(—1y"? 


Proof Let R be the matrix of reflection in the subspace of R* generated by v. Let 
X= PX’ define an orthonormal coordinate system which v points along the x} 
axis, Then in V’ coordinates we have reflectio1. in the x; axis, so, using Proposition 
5.18, 


PARP = 


So 
det (R) = det (PRP), by Proposition 7.5 
=(-1)""!, by property 3 . 
If-A ism by n, then the formula above becomes 
det (A) =(—1)* ful] det (C) 
We can use the function SPRFLCTR from Section 5.5 to compute R, but we 
must exercise some care, SPRFLCTR will return an identity matrix instead of a 


reflection if the first column of A is already in the proper form. Thus the term 
(=I)! is present only when R # ID n. 


v Z0ET A iA 


[1] Ac(RESPRELCTA AL<11)+ A 
[2] ZeALL A} te (tet tpA)eVs RBID I ty 
(3) a(tatT pA) 
ta) 2-26DET 1 4A 


s 


ExameLe 7.3 Choosing a S-by-5 matrix at random, we have 


94 8 90-13-73 
11 24-15 977 64 
23 58 88 -94 58 
69-25 68 5 63 
38 75 46-86 64 

DET A 

304376154 


71 Determinants 459 


Notice the size of the answer. Each entry has, roughly speaking, an order of 
magnitude about 10? and so the same is true of jo|| at every level. Since the matrix 
is 5 by 5, the determinant ought to have an order of magnitude of roughly 
(102)? = 10", ‘ 


The unexpectedly large size of such a determinant complicates using a func- 
tion such as DET to detect invertibility. The determinant must be small compared 
not to A but to A-n, where 4 is n by 1. 

For example, the matrix A below has rank 4, 


A 
2) 96 «68 at 
ai 210A ahi 87 
29-250 «28 81a 
425 879 678 230 -1009, 
a7 34122 

DET A 

2.3767 


Since the expected size of the determinant is about 10", we can say that 107 
is negligible compared to the expected size of the determinant (for this system 
Der is about te 15). 

A function such as DET is easy to write, and having it around does no harm, 
Determinants are theoretical tools, however, and are not of much practical use for 
data processing. 

Det (A) is the (signed) volume of the parallelopiped defined by the rows of A. 
By Proposition 7.3 it is also the volume of the parallelopiped defined by the 
columns of 4. 

Now suppose that y= /(X) = 6 + AX is an affine function from R" 10 R". 
Let C be the “unit parallelopiped” in R" — that is, the one determined by the 
columns of /D n. Then fearries C to a translate of the parallelopiped determined 
by the columns of A. So it carries C, which has a volume of 1, to a parallelopiped 
with volume det (4). The next proposition, which can be proved by the methods 
of advanced calculus, generalizes this fact. 


Proposition 7.7 Let f(X) = b + AX be an affine map on R", Let S be a subset 
of R® with (signed) volume V{S}. Then 


V(sIS) =de(AV(S) 


In R&and R® we have a connection between the volume of the parallelopiped 
determined by the rows of a matrix A and the determinant of A via a formula for 
the volume of the parallelopiped. In the case of R® the area is the base times the 
height, where the height is determined by dropping a perpendicular (Figure 7.5). 
Similarly in R® the volume of the parallelopiped is the area of the base times the 


460 Eigenvalues and Eigenvectors 


height, where the height is determined by dropping a perpendicular to the plane 
of the base (Figure 7.7) 

This suggests an inductive definition of the volume of a parallelopiped = 
whose edges are the vectors Uy, Ua, -+++Un- 

Suppose we know the volume of the parallelopiped with edges v,, Us...-+ 
Up. Call it A, Let v, = ch + us with respect to the subspace generated by v,. 
Uy, <4 Uy_y- Define the (unsigned) volume of to be V = |jvg A. In this ease we 
have v,...-. Uy Vectors in R® with h <n, When h = 7 we have defined the vol- 
ume as det (4), where the rows (or columns) of A are U,,.-.. U,- Thus we have two 
possible definitions for h = 1, The next proposition shows that the two definitions 
are consistent, 


Proposition 7.8 Let Uy, ..-.0, in" be the columns of the matrix A. The square 
of the volume of the parallelopiped defined by 0,.... Uy is det (ATA). Thus if 
h =n, the square of the volume is det (A)? 


Proof Let B-= AT, Then the rows of Bare v,..-. Uy and ATA = BBT. We work 
with B, since elementary row operations are more familiar than elementary col- 
umn operations, Let v, =v! + v with respect to the subspace generated by 
+ Uy: Let By =0-NB. Then 


ra 


and v! is a linear combination of the rows of B,. Thus by multiplying the rows of 
B, by scalar and adding them to uv! + v* we can row-reduce the matrix B to 


[i] 


, where det(P) = 1, we have 


B. 


since By 


|B87| = |PB,BEP"| = |P\|\B,B%\|P| = \B,B2) 


Now, since the rows of B, are perpendicular to v*, 
By VB. B, 
inn = |[ZILBT | = [[ee]erven 


eee [At 
earl ewyrl] lL o eal 


= |e" P18, BN 


7.1 Determinants 461 


By induction we may assume that ||B, 7) is the square of the volume determined 
by 0,.02,-.-. Uy_y- Thus we need only start the induction. 
For h = 1, however, 


|BBM = [vs vel) = det (ol?) = fol? 


the square of the length of u. 


EXAMPLE 7.4 


(a) What is the (unsigned) area of the parallelogram determined by the vectors 
(1,1, 1) and (1, =I, 2, 4) in RY 


(b) What is the (unsigned) volume of the parallelopiped determined by the 
vectors (1, 5, 6, 1, 5), (1, 0, 1, 0, 2), and (2, 3, —1, —1, —2) in R"? 


Solution 
fa) Let 


then 


= 
= 
" 


(b) 


(DET (8A)+.xA) +2 
6122907806 . 


Because of the divisions involved, the two methods of evaluating determi- 
nants discussed so far are often inconvenient for hand calculation with small 
matrices. The method of “expansion by minors” is often useful for small matrices 
or matrices of a special form. = 

‘The method is easily derived from the next proposition 


462 Eigenvalues and Eigenvectors 


Proposition 7.9. Let the n-by-n matrix A = [ey |Co|---|eh Where c, = Alsi] is a 
vector in R". Then 
det (ey {eo|---1ey + chl-- 1G) = det [e,|¢: 


+ dete, | 
A similar statement is true if 4 is partitioned by rows. 


Proof We will prove the corresponding statement about rows, since row reduc- 
tion is more familiar than column reduction, The statement about columns will 
then follow by taking transposes (Proposition 7.4), 

Interchanging rows / and 1 of A will change only the signs of the three 
determinants involved, so we may as well assume h = 1. Then the statement to be 
proved reads 


nt 


r 


det = det + det 


First notice that if the vectors (ry ry. 


<1 ,) are linearly dependent, then all 
three matrices are singular and all three determinants are zero, Thus we may 
assume thal r3...+4% are linearly independent, 

Next suppose that r, is in the space spanned by ry ry.--- ty. Then the first 
determinant on the right is zero. On the other hand, by multiplying rows 
r, with (> 2 by constants and adding to the first row, we may row-reduce 
{ry + ri lrol << |rIT to [r% [rg] «++ [rq] without changing the determinant. Thus 
the formula holds in this ease also, 

‘Thus we may assume that r, is not in the subspace spanned by ry... , 
follows that ry. r5, 


It 
+ sfy are a basis of R" (why?) and hence r; is a linear combina- 


tion of ry. Pas Ayr + Avy + -++ + Ayry. From this it follows 
that 

TAT) [rt Aay tere + oe + Ary 

i te ie) 


Fow-reduces to 


ntMn 


and 


row-reduces to 


Thus 


det 
bee 
rh 
= det] =} 4, det 
ay 
EXAMPLE 7.5. Suppose that 
a, by a 
dy by Cs) =5, lady 
yy cg ay 
Evaluate 
a, by + 3b, 
ae 
ay hy + 3% 


Solution By Proposition 7.9 


a, by + 3b, cy 
= lay ly, wee 
ay by +305 cy 


7.1 Determinants 


4 

% 

Iu 
1 " Lat 

hh a 

=}=det| =| + det 
‘n Mw In 
bg 
Oc 
BS) ey 
* 


463 


464 Eigenvalues and Eigenvectors 


EXPANSION BY MINORS 
Let us apply the last proposition to the evaluation of a 3-by-3 determinant. 


ayy ay ay 0 3 
Bx yy) = zy) + (itor cr 
a) ayy) lay cr 

0 ms) 0 ae ahs! 

= | +a aa | +) 0 ax ae 

| Jann ae ass 


0 ayy ayy 


We can apply Proposition 7.3 to the first determinant, and we can switch rows on 
the other two to bring them into the same form as the first 


ayy | dye Mes) tay | aw gy 

ay yy ha) we leo? a - 
te tH) _| > 

ay, gy) = Ay, - My dy] —|]0 | an a, 


Mis ey 


yy May Aa O bay ayl 10) ayy 
a Cen] ai ayy 

Ee lel Zl alien Fal hea 

tse ys Hise yy lazy Oy 


Notice that in the last term one more row interchange was introduced to put the 
truncated rows back into their original order. The 2-by-2 determinants are now 
easily evaluated, The calculation above is an example of expansion by minors on 
the first column, Any row or column may be used, If we had wanted to use the 
second column rather than the first, we could have started with 


My 2 ay 


4, 


My) xy gy yy ayy 


and then proceeded in the same way. The process can be abbreviated as follows. 
Let A be a matrix, The ith minor of A, written A,,. is obtained by deleting the 
ith row and jth column from A, computing the resulting determinant, and then 
multiplying by (—1)!* 
That is, cross out the ith row and jth column of A and multiply the resulting 
determinant by the sign assigned to the ij entry according to the checkerboard 
pattern 3 


7.1 Determinants 465 


If 


then the sign pattern is 


and 


dy) Ugg 


and so on. The expinsion by minors on the first row of A that was worked out 
above can now be written 


det (A) = ay, Ayy + yyy + yyy 


One should not attempt to remember such formulas, but simply work by 
crossing out rows and columns and using the sign pattern 


Exampce 7.6 Evaluate 


-3 8 -34 
-3 8 -3 3 
-6 12 -4 6 
2.2 03 


Solution There is a zero in the 4:3 place, so expanding on the fourth row or third 
column will reduce the problem to one of evaluating three determinants of si: 
3-by-3. The sign pattern for a 4-by-4 matrix is 


+ - + = 
- + - + 
+ - + = 
-+- + 
—3 8 -3 4 reine 
Be eae hate aa: Shad 
ap eae 12-4 6 
Sees 
2-3 4 
=8|2 -3 3/-6 
3 -4 6 


466 Eigenvalues and Eigenvectors 


Expanding the first 3-by-3 determinant along the bottom row gives 


-3 
=—3 


Multiplying the first column of the second 3-by-3 determinant by 3 and adding to 
the second gives 


1-3 4 
eve 
2 -4 6 


po 


13 14 abe 
=o) eek el-2h 3 


Multiplying the first row of the third 3-by-3 determinant by —1 and adding to the 
second row gives. 


12 -3 12 -3 
12 -3 00 oO =0 
2s)eos)) lps 4] 
Thus 

-3 8 -3 4 

3.8 -3 3 

~6 2-4 6X82) =—-4 0 

—2 2 03 


‘Two results that can be useful in theoretical discussions are not hard to derive 
using the expansion-by-minors formulas. We will have no use for them but list 
them for the sake of completeness. They should not be used for computational 
purposes. 


PROPOSITION 7.10 (Cramer's rule) Let A be an invertible n-by-n matrix and B a 
vector in R". Define 4, to be A with 4[:i | replaced by B. The unique solution of 


is given by x, =det(4,)/det(). 


7.1 Determinants 467 


Proposition 7.11 (The adjoint formula) Let A be n by n. Define A by Ali:/] = 
Ay, the j;i cofactor of A. Then 


AA = det (A) 


where J = 1D n. Thus if A is invertible, A! = (det Ay, 


EXERCISES 7.1 


Evaluate the determinants of the matrices in exercises 1 through 27 by any method. 


-!l -3 = aad 1 
toa 3. |o2 -1 
a, 4 oo 4 
aot 2s ey 
yi) eet aie oy 
even Vm ad 
2 0 3 a gieaues 
7 he ath Bll eulilsatameotia 
-4 4 2 33 -4 -3 
n aarigs Pdi} 
-10 -2 - ere eer 
Wa) 2 Pil) tsieiad a 
on -3 0 -1 a8 a 1 3 1 
30 2 Earley a 
er) (ees 
2 s 3 0 0 
trea eat = (tet) 
-6 -Il -3 0 -2 3 
-3 -4 -22 
=I 2 ol 
= 0 -2 10 ue 
-6 -4 -2 5 
V 18 
9 2. |18 -6 -5 ~6 0 
26 2 -8 -3 2 
0 0 0 Ot 


468 Eigenvalues and Eigenvectors 


=1 -122 8 -4 -2] 3.000 -2 
0 9-4 4 @ 4-100 -2 
Boe eae aa 10 m2: || =o eo 
O15 & 7 0 4 001 -4 
OG Oi to oO) 4 000 -3 
BO’ 14-8 =8 
Sie ax 6 a 
2%. | 48 28 -21 -12 -10 
Sem TB 2 
36 «0-16 -R -7 
Wo 2 <5 -5 9 -3 
=F 10) ee eee oe od) 
ce 7 a GN 0 
m4 Bea) if 72) fs 1k 
4 ls 38 53) 10 <8 
9 WB -13 -5 10 1 
B 2% -17 -5 oN =4 
-0 -4 -4 mM -6 4 
-57 -15 -12 39 -18 10 
=wins =4) tie. SR 14) 
25, 2 7-17 10 =4 
9 -4 -3 0 4 -2 0 
-105 52 15 12 -33 17 -8 
=33 16 7 4@ =I) 6 =3, 
6 -9 
8 14 
%. | -2 2 
1 3 
10 16 
=F 15, 
24 -2 
28 -2 
-17 > 
2, 5 0 
2 —2 
B 0 
-19 ai 
Let = 0.0) pe = (50 Py = Bs Ys Pa = 2.2), Py = (1,3), Pg = (1,2). Im exer- 


cises 28 through 35 a polygon in R® is given by specifying the order of the vertices that are 
taken from) the set of points p, through p,. Sketch the polygon and compute its (signed) 
area. 


28 MPPs 


29. PsPuPsP2PiPo 30. PPsP2PPsPo 


7.1 Determinants 469 


31 PsPsPaP2PsPoPs 32 PPaPsP2PoPsP2 
34. PPP PoP s 35. PyPaPaPaPPaPs 


33. PiPsPsPoP Po) 


36. Show that if is a polygon in the plane and the coordinates of the vertices of 7 are 
all integers, then the area of = Is either an integer or an integer plus 4 


37. Let R be the matrix of reflection in an h-dimensional subspace of R" What is 
det (RY? 


Hint; Modify the proof of Proposition 7.6 
a 


A eanaesah caret 
Op] Satties the unit circle 


38. (a) Show that the linear transformation with matrix | 
xf 49% = 1 (0 the ellipse (x2/a") + (92/68) 


Hint: Parametrize the circle as 
108 1 
sin 


apply the mapping and eliminate the parameter 1 
(b) Derive a formula for the area of an ellipse with semi: 
Hint: Proposition 7.7. 


ot 


Nes @ and b, 


In exercises 39 through 45 vectors defining parallelopipeds in R" are 
tion 7.8 to compute the (unsigned) volume of the given parallelopips 


given. Use Proposi- 


39. ADO.) 40. (0,.D.(=1, 1.0) 
41 (2.0.2), 1,0, 4 (1, =1.0,1,0, 1,0) 


43. (1, 1.0,0),(0, 1,0, 1),(1.0, 1,0) 
44. (1,0.1.00.0.1,0. D1 LD 
45, AAA AD 13, =2, 3), (1, 1,0,0,0),,0, 1,0, 1) 
46. Write an APL function called VOL that takes a matrix whose volumns define a 
parallelopiped in R® and returns the (unsigned) volume of the parallelopiped, 
47. Show tha Iwo 
48, Show that a matrix of integers with determinant equal to 1 hay an inverse matrix 
whose entries are integers. 
49. Show that if A has zeros below the main diagonal, then the determinant iy the 
product of the diagonal elements 

Hint: Expand by minors down the first column and use induction. 
50, Show that an orthogonal matrix has determinant equal to 1 

Hint: Consider det (47) 
SI. (a) Let A be symmetric. Show that det (4) is the product of the eigenvalues of 4 


identical columns of rows has a zero determinant 


a matrix wi 


Hint: Apply Proposition 5.9 and exercise 49, 
(b) For an arbitrary square matrix A, show that (det (4)) is the product of the 
singular values of 
52, Write an APL function: 


Name: PLANIMETER 
Input: Points py, fta- ++» Pgs Py in R? stored as the columns of a matrix P 
Output: The area of the polygon p,p/)y-—-Py 


470 Eigenvalues and Eigenvectors 


7.2 Eigenvalues and Eigenvectors 


In Section 5.2 the problem of diagonalizing a symmetric matrix was solved. In this 
section we approach the problem for a general matrix A. The solution will be less 
complete because not every matrix can be diagonalized. 

We begin by considering a typical diagonal matrix. 


This matrix defines a linear transformation ¥ = /(X) = DX from RP to R* 
In terms of coordinates x,,.X3...++X, We have 


MX yy Nay Ny) = (Ap k ye daXige MyNiye oo UyXq) 


Thus the ith coordinate of 
vector v lying along the 


s stretched by the factor d,. This means that for any 
| axis 


flv) = Do = dw 


‘This gives us the approach to diagonalization we shall use. Given a matrix A, 
We look for vectors v and scalars \ such that 


Av 


=v 


It'we can find a basis of R" consisting of such v Uys e+- Bye then We can 
define a coordinate change VY = PX" so that v, points along the xj axis. In X” 
coordinates the function /(X) = AX will have the form 


Mx; 


oe XQ) = (dy $e day ons dy) 
and so 


D= PAP 


will be diagonal. 


The problem then is to find both the v's and the \’s. That is, we want all 
solutions of 


AX =\X 


where not only X but \ also is unknown, This is no longer a system of linear 
equations, the \ makes it nonlinear, so our previously developed techniques do 
not apply 

An approach is available that is quite valuable for theoretical purposes, 


7.2 Eigenvalues and Eigenvectors 471 


though it does not give a reliable computational method, Rewrite the equation 
above as 


AX —~AX =0 
or 
(4-\NX =0 


where / = Dn. The one obvious solution to the equation, A arbitrary and X= 0, 
does us no good whatever. The zero vector cannot be used to define an axis of & 
coordinate system. Thus we are interested only in solutions with X # 0, But this 
means that A — AJ must be singular, Thus we wish to find the values of \ for 
which A — \/ is singular, Once such a) is found, say \ = d, then the correspond- 
ing X"s are just the null space of A — d/, and we have at least two ways (ECHE- 
LON and PERP) of computing null spaces. To find the \’s for which A ~ AJ is 
singular we could now reduce the matrix A — AJ to find the condition on \ for an 
infinity of solutions of (A — M/)X = 0. This is a tedious procedure, and ECHE- 
LON cannot be used because of the unknown parameter A 

Alternately one can use the fact that A —A/ is singular if and only if 
det (A —\/) = 0 and try to find the roots of the nonlinear equation 


pod) = det(d — 1) =0 


This last approach is very useful for theoretical arguments, although it is 
impractical as 4 computational method. 


EXAMPLE 7.7 Diagonalize the matrix 


Solution. First we compute the funetion p(\) = det (A — AJ), Notice that A —M 
is just A with ) subtracted from each diagonal entry. Expanding by minors down 
the first column, we have 


=e 
pas) = |-2 3- 
2 2 

Besse 

=-(| 2 


—(MAZ = 60 +9) = 
—(N = 6A? + 1A — 6) 


27 — 3d) + 42 —\)} 


472 Eigenvalues and Eigenvectors 


Thus we need to solve the equation 
=p(d) = q(d) = 3 — 6\? + 1A — 6 = 0 


This example has been chosen with integer roots. The possible roots are the divi- 


sors of 6 —that is, 1, +2, +3, +6, 
Since q(1) = 1 —6 + 11 —6 =0, \ — 1 is a factor, Long division shows 
that 
he 
hy soy 1p 


x-1 
= (A — IMA? — Sd + 6) 
= (A — 1X — 2X — 3) 


Thus the possible values of \ are \ = I, 2, 3. 
Now we must calculate the null spaces of A — I, A ~ 21, A = 31. 


ECHELON A-1D 3 
1.000 0 000 1.000 ' 
0.000 1 000 1.000 so the null space is r] 1 
0.000 0.000 0.00 f 


ECHELON A-2x10 3 
1.000 0.000 0 500 
0.000 1 000 ~1 000 
0.000 0 000 0 000 


so the null space is 1 


ECHELON A-ax1D 9 
10060 1 000 0 00e0 ' 
S.A7E18 -3.47E-18 1 00€0 so the null space is r |1 
0.0060 0 00€0 ~— 0. 00€0 0 


If we take any matrix of the form 
ay 
Pala py 
o pO 


with afty # 0, then Y = PX’ defines a new coordinate system in which the matrix 
of the linear transformation y = AX becomes 


100 
PPMP Ke 0 2 D1 Ne a 
003 


As the above calculation makes clear, p(\) = det (A — AJ) is a polynomial in \. 


7.2 Eigenvalues and Eigenvectors 473 


Derinrrion 7.3 


1. Let A be an mby-n matrix. The characteristic polynomial of A is p(A) = 
det (A — M). 


2. The eigenvalues of A are the roots of the characteristic polynomial. 


3. Let be an eigenvalue of A. The eigenspace belonging to) is the null space of 
A = QM. ILis the set of vectors v such that Av =o. 


4. Let A be an eigenvalue of A. An eigenvector belonging to ) is any nonzero 
element of the eigenspace of \. 


The zero vector is not considered to be an eigenvector, since it is of no use in 
defining a new coordinate system, 


Proposition 7.12 Let A be an n-by-n matrix. The characteristic polynomial of A 
is of the form 


PO) = 


=IPAT + GAP + GATE $e bE ty 
Further ¢ = det (A). 


Proof Expanding det(A — \/) by minors down the first column, we have 


ayy — A yg ay 
Co yy —A Ayy 
eu) a, | =X 


The first term is (a), — A) times det (B — Af), where B= 1 1 J A, Thus if we 
assume by induction that an (” — 1)-by-(n — 1) matrix has characteristic polyno- 
mial beginning (—1)"-!\"-, then the highest-degree term in this expansion is 
(1a, 

Further, ¢ = p(0) = det(A — 0-1) =det(4). 


Note: It may be shown that the coefficient of the ( — k)-degree term is, 
except for sign, the sum of the determinants det (4|¥ V) as the vector V 
takes on the values m + ik m =O.1oo...n—k +1 


Propesition 7.12 shows that an n-by-n matrix has af most n eigenvalues, 


ProposttioN 7.13. Let A be a matrix and let v,,.... 0, be eigenvectors belonging, 
to the eigenvalues Ay, Ay.--- Ag, respectively. Then if the A, are all different, the 
set of vectors (vy, +0,) is linearly independent 


474 Eigenvalues and Eigenvectors 


Proof Suppose that the set is dependent. Then there is an /<<k such that 
bys ee6.4; are independent with v,,, = «0; + avs + --- + 4,0), Multiplying 
through the equation by 4, we get 

Drs ier = OO) + pave + o> +a,ey 


Multiplying the original equation by ),,, gives 


Morten = Arg lary, + a 


Subtracting these two equations gives 


(oleate Jed feuy — iad 
(Az — Area) 


0= 
40, — Ava) 


Since u,,...,0, are independent, we have 


(Ay — Ang) = (Az — Ay, 4, — 


Since the eigenvalues are all distinct, it follows that 
hence vj,, = 0, But then v,,, is not an eigenvector. 


Proposition 7.13 gives us our simplest test for diagonalizability. 


PROPOSITION 7.14 Let be an n-by-n matrix, IfA has n distinct eigenvalues, then 
there is a coordinate change Y = PX’ such that P-!P is diagonal. 


Proof LetX.Azy «++, be the eigenvalues, Since A — A /is singular, the dimen- 
sion of the null space of A —),/ is at least 1, and so there is a v, # 0 such that 
Ayu, Let P= [ry]ug}..- [Yh 


If the matrix A does not have m distinct eigenvalues, then it may not be 
diagonalizable. 


EXAMPLE 7.8 Is 


4=[0 1] 


diagonalizable? 


72 Eigenvalues and Eigenvectors 475 


Solution The characteristic polynomial is 


pix) 


the cigenspace belonging to \ = 1 is 


‘(i 


We may choose u, = (1,0) to define the xj axis, bul we cannot get a second 
independent eigenvector to define the xj axis. 


Derinttion 7.4 Let p(A) be the characteristic polynomial of 4. The eigenvalue \, 
has multiplicity k if (\ —) divides p(d) but (A —,)8* does not divide p(d). 


In Example 7.8 above the eigenvalue \ = 1 has multip! 
‘example immediately generalizes. 


Ly equal to 2. The 


Prorostrion 7.15 Let \ be an eigenvalue of A. If the dimension of the null space 
of A —A/ is less than the multiplicity of A, then 4 is not diagonalizable. 


EXampce 7.9 Is the matrix 


diagonalizable? 


Solution 


3-A 2 1 
1 


=6-»| 


= (3 = AMA? — 2A + 1) — 2-2) — 3-1) 
= (3 -\A — DF -A—1) =A = DIB -MA- 1-1) 
= —(A — IA - 2)" 


476 Eigenvalues and Eigenvectors 


Thus the eigenvalues are \ = 1, 2, 2. The corresponding null spaces are given by 


ECHELON A-1D 3 


1000 0 00£0 0 00£0 
1.73E°18 1.0060 5 00E 1 
3.47618 0 00E0 0 0060 


ECHELON A-251D 3 


1000 © 00E0 10060 
1.73E°18 1, 00£0 1, 736°18 
2.60E-18 0 00E0 2.606 18 


‘The null space of A — 2/ is one-dimensional and the multiplicity of the eigen- 
value \ =2 is two. Therefore the matrix is not diagonalizable, = 


Computing the polynomial p(A) = det (A — A/) can be tedious. Here is a way 
{o compute it by machine, First we use the definition p(\) = det (A — M1) to 
compute some specific values of p(\). The function POLYAT takes A as a left 
argument and a vector X = (x), Xy,-.-.%,) aS a right argument. It returns the 
vector ( px). Pg) «+s POX): 


¥ ZA POLYAT x 


(14) 210 
(2) (00x) /0 
(8) Z(DET A-(1TX)x1D 14pA).A POLYAT 11x 


y 


The function is defined by induction on the number of components of X. 

Using this function, we can find a number of points (x,..,) on the graph of 
pd). 1 Ais n by and we have more than n points on the graph, we can then use 
domino to calctlate the coefficients of p(A), because the least-squares polynomial 
of degree n through the points can only be pid). 


EXAMPLE 7.10. Is the matrix 


' 
1 
1 
i 

2 


> 
o 
on 


diagonalizable? (It is known to have integral eigenvalues.) 


Solution A polynomial of degree 4 is determined by its values at five points. Of 
course we can use more than five if we wish; it can’t hurt. 


YeA POLYAT x17 
C-YBxe 43210 
Kee ee oat an 


7.2. Eigenvalues and Eigenvectors 477 


So p(\) = ¥ — 2\¥ — 30* + 4\ + 4. The roots of a polynomial divide the con- 
stant term, so we should try A = +1, +2, 4, 
p=) =142-3-444=0 
pQ) = 16 — 16-1248 +4=0 


and these are the only two roots among 


PO) = (\ + 18 = 37 + 4) 


the possibilities. Long di 


= (A + IA — 2MA? —d — 2) 


But A* —\ — 2 =(A + 1)(A — 2), s0 pA) factors completely as 


pir) = (A + DAA = 29% 


The eigenvalues are thus \ = —1, —1, 


2,2, Both have a multiplicity of 2. 


ECHELON A~-1x10 4 
1.0060 1.0020 «3. 25E 191 0060 

247618 -3.47E 18 1.0020 1.0060 

0 0060 «0000-0 00D .94E 1B 

© 0060 «0. 000-0 00ED 6 94E 1B 

ECHELON A-2x1D 4 

10060 1.796 18 1.0060 5 001 

0.0060 1.0060 1.0060 0. 00€0 

0.0060 0 00£0 «0 O0ED~— 0000 

00060 0 00£0 «0 00ED-— (0.00 

So the eigenspace belonging to \ = —1 is the column space of 


and the eigenspace belonging to \ =2 


oon 


is the column space of 


eet 

10 . 
ri feel [sr i 
oo. agen 


478 Eigenvalues and Eigenvectors 


Thus putting 


=! =) =1 =! 
1 0 1 0 
Sirs Tet ar 
0 1 0 2. 
we have 
-! 000 
ane =i cro 
0 020 
0 002 


The matrix A is diagonalizable. 


If the eigenspace of \ has too low a dimension, then A will not be diagonaliz~ 
able, There is another way in which an n-by-n matrix A may fail to be diagonaliz~ 
able, The characteristic polynomial may fail to have n roots. There is a way out of 
the latter problem, however. The fundamental theorem of algebra states that a 
polynomial always factors completely into linear factors if complex numbers are 
allowed as roots. This means that a matrix may become diagonalizable if we allow 
our scalars to be complex numbers as well as real numbers. This is a very useful 
device for some applications, particularly for the theory of linear differential 
equations, 


Derinmion 7.5. The eigenvalues of a matrix A are all the roots, real and complex, 
of the characteristic polynomial, 

The definitions of eigenspace and eigenvector remain unchanged, with the 
understanding that vectors and matrices may have complex as well as real entries. 
The space of n-tuples of complex numbers is denoted by C". 


able? 


Exampte 7.11 Is the matrix R_.. diagonal 


Solution A rotation matrix R, will not be diagonalizable (@ 4 kz) if we restrict 
our attention to real scalars, because the only vector v parallel to Ryvis the vector 
vu = 0. Using complex scalars, however, we have, for @ 


pid) = 1R,)2 —M| = lie =| 


= + 


=A +A) 


where i= V—T. Thus the eigenvalues are \ = +i. 
The corresponding eigenspaces can be found by row reduction. 


7.2. Eigenvalues and Eigenvectors. 479 


to find its null space. Interchanging the two rows, we have 
it alle =1 =| 1 =i 
oli <del 1 
Pivoting on the 1;1 entry gives 


[i ale T=b Sane la =(2 +0)" Me 0 


Thus the null space consists of all vectors of the form {/ 1]" where / is now any 
complex number. 
\ = =i: We row reduce 


i -l 
a=; il 
to find the eigenspace belonging to —/ 
l sll; “=U ‘| 
1 oul dole -1 
1 oy i 1 t i) i Ye 
ee il -l=[ aealele Jeane al 


and the eigenspace consists of all vectors of the form {=i 1] where # is any 
complex scalar. 
Notice that 


vol= l=] 
fall ese = ica 


So (i, 1) and (—é, 1) are eigenvectors belonging to (and —i, respectively. Taking 


r=[i cil 


480 Eigenvalues and Eigenvectors 
and using the coordinate change X = PX" in C*, we find 

Auld ali al =| aif a le al 
2l-1 lr odly tt >2l-t he =a 


Mahe Lf-l+(-b See 


fave, 


+e “UL 1+(-)  1=(-b. 


[i 0 
0 -i 
Thus R,,» is diagonalizable when complex scalars are allowed. = 


As the computations in the above example imply, many of the techniques 
developed in previous chapters for vectors and matrices of real numbers hold for 
complex numbers as well. The techniques that remain unchanged include row 
the related theory of flats and subspaces, including dimension, 
nd so on, Modifications are required for techniques involving dis- 
, and any formula using transposes. 


PRorosition 7.16 Allowing complex scalars, the matrix 4 is diagonalizable if 
and only if the dimension of the eigenspace of \ is equal to the multiplicity of A. 
In particular, 4 is diagonalizable if all the eigenvalues have multiplicity equal 
tol 


‘A method of handling matrices with complex entries with an APL processor 


that does not recognize complex numbers is sketched in exercise 21. 


JORDAN BLOCKS 


‘The matrix of Example 7.8 is typical of those which are not diagonalizable. The 
basic nondiagonalizable pattern is 


AO: 0 
oral 0 
A= . (2) 
1 
OGOISOD Ses5X 


The matrix J, has \’s on the diagonal and 1’s on the first superdiagonal. {1 is not 
difficult to see (exercise 24) that ) is the only eigenvalue of J, and that the corre- 
sponding eigenspace is one-dimensional. 


7.2. Eigenvalues and Eigenvectors 481 


Derinimion 7.6 The matrix of equation (7.1) is a Jordan block matrix. 

The matrix of Example 7.8 is a 2-by-2 Jordan block with \ = 1. A Jordan 
block may be regarded as a “generalized eigenvalue” with an ordinary eigenvalue 
being simply a I-by-1 Jordan block, It may be shown that there is a unique set of 
Jordan blocks associated to each square matrix and that, if the definition of “diag- 
‘onal matrix” is loosened up to include matrices with Jordan blocks on the diago- 
nal and zeros elsewhere, then every square matrix is “diagonalizable.” For the 
sake of completeness we record the following proposition without proof. 


Proposition 7.17 Allowing complex scalars, given any square matrix A there is 
an invertible matrix P such that 


J; 0 
ap = [2% 
pap =[ i 


where Jy....-.Jy, are Jordan blocks, 


EXERCISES 7.2 


In exercises J through 10 decide if the given matrix is diagonalizable. (The characteristic 
polynomials have been designed to factor.) 


Sey es Ma 202 
ufo il 3 [1 | = [= a 
13 -2 Oo -1 0 4 2 1 
4 jo 2 0) 5. 2 30 6. —3 -!1 <I 
Oo 3 =1 2-21 -! =I 1 
;| 1 00 0 
7 =4 , 
1 
—3 -1 
id) 
2 0, 
a aoe iT 
3 aed 


11. Show that the eigenvalues of A + a/ are A, +a, Where the A, are the eigenvalues 
of A 

12. Show that the « 
13. Show that if, is an eigenvalue of A with eigenvector &, then Xj 
A? with eigenvalue v. 

14. Show that if the A, is an eigenvalue of A with the eigenvector 0, then 
At +a,..Ap! +--+ $d is an eigenvalue + cigl with 
eigenvector v. 

15, Show that A and AT have the s: 


senvalues of aA are ad,, where the A, are the eigenvalues of A. 


an eigenvalue of 


482 Eigenvalues and Eigenvectors 


16, Show that 
(a) Ais singular if and only if 0 is an eigenvalue of A. 
{b) IFA is nonsingular, then the eigenvalues of A~! are 1/,, Where the A, are the 
eigenvalues of A. 
17, If a has zeros below the main diagonal, show that the eigenvalues of A are the 
diagonal entries 
Hint: Expand det (4 — 2) by minors. 
18, Given any polynomial p(A) = (—1)%ay + apd +o +d, ANE + AY), show that it 
is the characteristic polynomial of the matrix 


=Ohny tae =a 44) 
' Wrage noe C 
0 ' o 0 
0 0 f 


Hint: Expand by minors 
19, Let A be the perpendicular projection matrix for the subspace S. 
(a) Show that every v #0 in S is an eigenvector of P, for the eigenvalue X = 
(b) Show that every v ¢ 0 in S* is an eigenvector of /, for the eigenvalue A 
() Show that \ =0,\ 
Hint: Let Pu = Ne and write o = 
(4) Let the columns of U form an orthonormal basis of Sand the columns of Van 
orthonormal basis of S!, Let @ = [UF]. Show Q-! = QF and use matrix algebra to 
compute O*7.0. 
Hint: Pe = UU". 
(e) From (d) show that det(P, — MF 
sion of 5, 


are the only eigenvalues of Py 


+08 


Tye" AA = DA. where & is the dimen- 


20, Let Ry be the reflection matrix for the subspace S. 
(a) Show that every v 4 0 in Sis an eigenvector of Ry for the eigenvalue \ = 1 
(b) Show that every & 4 0 in S* iy an eigenvector of Ry for the eigenvalue \ = —1 
(©) Show that A = 1 and \ = —1 are the only eigenvalues of Ry. 
Hint; Let Ryu = Av and write u =o + 0 
(@)_ Let the columns of U form an orthonormal basis of Sand the columns 


of ¥ an orthonormal basis of S*, Let @ = [Uj]. Show Q>! = QF and use matrix 
algebra to compute ORO. 


Hint: Ry =2UUT = 1 


() From (d) show that det (Ry — M1) = 
dimension of S. 


= HMA = DMA + D8, where kis the 


21. (Inverting complex matrices in APL) Let the complex number Z =a + fii corre 


spond tothe matrix A = [* =e] 


72. Eigenvalues and Eigenvectors 483 


(a) Show that if Z, corresponds to A, and Z, corresponds to A,, then 2,2 corre- 
sponds t0 4,45. 


{b) Show that if Z, corresponds to 4, and 
corresponds 10 4,84,. 


(©) Show that the complex conjugate Z corresponds 10 AT. 
Let the complex matrix, 


corresponds 10 Ay. then Z, = 


correspond to the partitioned matrix 


(d) Show that if Z, corresponds to @, and Z, corresponds to Gy, then ZZ, corres 
sponds to @,@ 

{e) Show that if Z, corresponds to @, and Z, corresponds 10 Gy, then Z5'Z, 
corresponds 10 @,0G.. In particular 8G, corresponds to Zs! 

(f) Let Z* denote the coordinatewise conjugate of the transpose of Z, Show that 
Z* corresponds 10 @, 


{Computer assignment) Use the result of exercise 21(e) to do exercises 22 through 25 using 
domino. 


22 Solve vt + (=f =2 
deat (+e =i 
NaH 


23. Solve ktytesl 


xa=y-2=2 


y 14s 
ta = 
24. Invert eo we 
2.0) 0 

Ce 

fide. 

25. Invert iG? FR ck ; 
1 


484 Eigenvalues and Eigenvectors 


26, (a) Show that the n-by-n Jordan block J, has characteristic polynomial p(\) = 
(=) = a) 
Hint: Expand by minors. 
(b) Show that the eigenspace of J, is one-dimensional. 
Hint: What is the echelon form of J, — al? 


27. Let J, be an n-by-n Jordan block and let e, be the vector with ith component equal 0 
1 and all other components zero, Show 


Ine 
Int 


= ae, 
eateg, k>I 


28, Suppose that P-AP = J,, a Jordan block. Let P = [vj|0.|- -.[U} Show that the 
VECIOTS Uy. Uys «+ Uy Satisfy the recursive equations 


(A alo, =0 
UA=aly =u k>1 


Hints Exercise 27, 

29, ‘The matrices A below have characteristic polynomials of the form p(\) = 
(=1)"\ — a) and one-dimensional eigenspaces. Find a and use the equations of exercise 
28 to find matrices P such that PAP = J, 


(a) [; Sal (b) [ 3 ‘| (ce) 1o «) -1 2 
Cot -1 4 a4 -) 13 
03 -| 26 


7.3* Powers of Matrices Revisited 


Powers of matrices were discussed in optional Section 3.6%, where difference 
equittions and stochastic matrices were discussed. If A can be diagonalized, then 
there is another approach to analyzing the powers of A 

{notice that if D is a diagonal matrix, then 


‘ 
d, di 


d, 0 dt 


So we need not carry out a full-blown matrix multiplication; we just take the 
powers of the diagonal. Now suppose that A can be diagonalized 


PAP =D 


7.3 Powers of Matrices Revisited 485 


Then A = PDP“! A® = PDP-*PDP~ = PD*P-1,..,,A* = PD*P™, This gives 
us a formula for the nth power of 4 


Exampte 7.12. In Example 3.23 the first twenty Fibionacci numbers were com- 
puted by recursion, Find « formula for the nth Fibionacei number. 


Solution The Fibionacci numbers are defined by 


Sy t Mae 


Following the prescription of Section 3.6%, we write this as 


v, = AY 


where 


cio ie Dce 


i] 


To compute A*-? we diagonalize A. The characteristic polynomial is 


Then 


ie 
1 


pir) 


so the eigenvalues of A are 


1+ vi=4-1) _ 1+ V5 
EME ES 


To find the eigenvectors we compute the null space of A — J 


1+ V5 
he 1 1 
eens row-reduces to 0 
1 + 
and 
1- V5 
1 be ea 
1-5 row-reduces to 0 0 


486 Eigenvalues and Eigenvectors 


The corresponding eigenspaces are 


1+ V5 
‘ee and 1 
1 1 
Taking 
d+ V5 
1 1 

we have 

and 


0 


eR 


eg 


or 


FAY . ( =) - 


Notice that although the nth Fibionacci number is obviously an integer, irra- 
tional numbers appear in the formula for the nth term. In fact complex numbers 
may even be necessary to answer a question that seemingly involves only whole 
numbers, This is why complex scalars must be introduced, 


Exampce 7.13 Find a formula for x, if 


7.3 Powers of Matrices Revisited 


Solution 
ee yae teeny ier 
x,J=|1 0 offs, 
x, J lo 1 offs, 
lane fl 
pay=| 1 =A 0 
th) eek 
& Oe Tl 
=0-»| 1 = =| 1 | 
=-M4M-A+1 
=(\ = 12 + 1) 


=(\ = 1A = AXA +4) 


The eigenvalues \ = 1, i, ~ are distinet, so A is diagonalizable. 
The eigenspaces are the null spaces of A — AJ for \ = 1, +i 


fo -1 1 1o -I 
1 =1 0} row-reducesto JO 1 =1 
Oo 6 =] 00 0 


The eigenspace is 


A 
1-f -1 1 10 i) 
1 =i 0] row-reducesto ]O 1 —/ 
0 1 -i 00 0 
The eigenspace is 
-! 
al 
1 
i4i/ -1 1 to 1 
1 iO] row-reducesto [O 1 / 


0 Hey 00 0 


487 


488 Eigenvalues and Eigenvectors 


The eigenspace is 


Since 


we have 


or 


prat 
2 


iy 
Trt + (-1y 
P+ imts (it 
Veet (iy 


~iy-? 


7.3 Powers of Matrices Revisited 489 


In the last two examples it was not necessary to compute P!, If we think of 
the coordinate change given by X = PX’, we see that P~ was used only to get the 
coordinates of the starting vector v, in the X" coordinate system. This can be done 
by solving 


PX’ =0, 


which is usually less work than inverting P when P is | 
procedure one first obtains HW, the solution of PX’ 
expression for u,, as 


-ger than 2 by 2. With this 
Oy, and then obtains an 


ou, = PDOW 


where m =n —k for v, in RE. 

The second application of the powers of a matrix discussed in Section 3.6* 
involved stochastic matrices, In Section 3.6* a matrix was called stochastic if it 
had nonnegative entries and if the sum of the entries in each row was 1. We prefer 
to work with columns rather than rows below, so we will call a matrix stochastic if 
it has nonnegative entries and the sum of the entries in each colwnn is one. 

In Example 3.24 a stochastic matrix A was exhibited for which the rows of A" 
all approached the same vector as n became large, Since we are working with 
columns in this section, we replace A by A? and have a matrix for which the 
columns of A" approach a fixed vector. 

We are now in a position to analyze this phenomenon in much more detail 
The basic fact is given in the next proposition. 


Proposrrion 7.18 If the entries of the columns of the square matrix A sum to 1, 
then \ = 1 is an eigenvalue of A 


Proof First notice that A and AT have the same characteristic polynomial and 
hence the same eigenvalues. In fact, since det (B) = det (B") by Proposition 7.4, 


pid) = |A —AT| = |(A =A)" = [AT —AI| 


Now let v = (1, 1,..., 1). For any matrix B, Bo is + B. Thus if the entries of the 
columns of A sum to 1, then 


Mv =(l,..- =v 


Which shows that v is an eigenvector for AT belonging to the eigenvalue ) = 1. It 
follows that A also has an eigenvalue \ = 1 (v is not, however, an eigenvector 
ford). = 


We will now investigate the existence of 


lim A* 
coe 


490 Eigenvalues and Eigenvectors 


by diagonalizing A. The interested reader may check that, if A is not diagonaliz~ 
able, the same conclusions may be reached using Jordan blocks (see exercise 17), 

Since \ = | is an eigenvalue of A, there is, assuming A diagonalizable, a 
matrix P with 


Now 4 and A are representations of the same transformation of R" in differ 
ent coordinates (the coordinate change is ¥ = PX’, which is continuous), hence 


lim A¥ exists if and only if lim A* exists. Using the formulit 
kom re 


M 0 
Ak = 
0 XY 


AF exists for all #=2,3,....m 


We see that tim \* exists if and only if fi 


Now [N| = Aff > oo Af Ay) > 1, so if the limit fim 4* exists, one must have 


Ail SJ for all i. Further, if Ay] = 1. then lima, exists only if A, = 1. For 
example, lim (—1)* does not exist, (We leave the general case as exercise 16.) 
ie 


We conclude, then, that lim 4* exists if and only if {A,| <1 for all / and 


ie 
(i =Lonly ita, =1 
Now consider the case where \, = 1 and |\,| <I for (= 2, 3,.... n. Then 
ro 0 
= Pllim A*)P*) = “1 
lim at = Pim ANP! = P10) | P 
0 


So the columns of A* approach multiples of P{:1] as k grows without bound. 
Now one can show (exercise 15) that if the entries of the columns of A sum to 
1, then the same is true of A* for all k and hence, by continuity, the entries of the 


columns of lim A* sum to 1. In the case under consideration this means that all 


the columns of A* approach that multiple of P{;1] whose components sum to 1. 
Nolice that P{;1] can be chosen to be any eigenvector belonging to A, 


7.3 Powers of Matrices Revisited 491 


To summarize: If\, = 1 and A,| <1 for > 1 and, further, the entries of the 
columns of A sum to 1, then as goes to infinity the columns of & approach that 
eigenvector belonging to \ = I whose components sum to 1. 

It may be shown that a stochastic matrix with no entries equal to zero satisfies 
the condition , = 1 and |A,| <1 for > 1 (see Proposition 7.25). This gives the 
following basic result on stochastic matrices. 


Proposition 7.19. Let A be a stochastic matrix and suppose that there is an 
integer / such that 4! has only positive entries. Then as k goes to infinity, the 
columns of A* all approach that eigenvector of \ = | whose components sum 
tol ow 


Note: If p is any vector, then the multiple of v whose columns sum to 1 is 
Ve +i. 


Exampe 7.14 


" 
om 


44 
44 
43 


is the transpose of the matrix of Example 3.24, where it was discovered that 4" is 
fuzzily equal to A%”. The matrix A* has no zero entries, so Proposition 7.19 ap- 
plies. 

‘The matrix (A — 1) row-reduces to 


1o = 
ol -—*% 
00 0 
Hence the null space of A — I is 
F 6 
‘lal or 1 3 
1 10 


Thus the steady state approximated by the columns of A" is (fab ff). = 


Exampe 7.15 


[i ol 


is clearly a stochastic matrix, The limit lim A* does not exist, however, In fact 


492 Eigenvalues and Eigenvectors 


({ if k is odd 
Ata 
J ifkiseven 


The characteristic polynomial of A is 


i ae Ane, 
PAY | 1 sl =I 


and hence the eigenvalues are A, = 1,\, = —1. Thus not every stochastic matrix 
satisfies the hypotheses of Proposition 7.18. = 


EXERCISES 7.3* 


In exercises 1 through 6 a sequence of vectors of scalars is defined inductively. Find an 
expression for the nth term, 


7, Suppose that a linear difference equation is defined by 


= HS yon + Tenner + 


Seta 


when X4. 
polynomial 


+Xy are given. Show that the associated eigenvalues are the roots of the 


Oy AN foe ay gM EMM 


Hint: Exercise 18 of Exercises 7.2, 


In exercises 8 through 14 decide if the limit lim A* exists by diagonalizing A. Ifthe limit 


exists, what is it? Is the matrix stochastic? Does the matrix satisfy the hypotheses of 
Proposition 7.19? 


El Ea] maps 
Ue ae 
ee a oe 


74 Congruences and Affine Transformations in Space 493 


15. Let bea vector whose components sum to I(1 = + /v) and let A be a matrix each of 
whose columns sums to 1(1\.<+/a) 


(a) Show that the components of the vector w = Av sum to | 
Hint: w = v,Al:l] + bedl:2) + 0+. + v, Abn. 

(b) IPB isa second matrix with IA. <+ +B, show that the same is tru 

by induction, the same is true of AS, & > 0. 


AB. Hence, 
(©) Show that (a) and (b) above may be derived from these APL. identities: 
(+/A+ xB) = (e/A)* 5B. Aa matrix, B a vector or matrix, and (++A) 
t+ xa, Aa matrix 

16, Use the fact that any complex number = may be writte 


= =|zKcosd + sind) 


for suitable @( = Arg(z)) and the fact that 


(M(cos kd + isin kit) 


to show that if fim 2* exists, then |e] < Lor = 1 


17. Let J, be a Jordan block, Show that if Jim J exists, then (A) < | 


7.4* Congruences and Affine Transforma: 


A working knowledge of the congruences (isometries) of three-dimensional sp: 
is useful in such diverse applications as crystallography, control of artificial satel- 
lites, computer graphics, and theoretical physics. In this section we will use the 
theory of eigenvalues and eigenvectors to prove a fundamental result about eon- 
gruences of three-dimensional space: every such congruence may’ be factored into 
@ rotation, a reflection, and a translation (Proposition 7.21) 

We also give a geometric description of an arbitrary, but nonsingular, affine 
transformation of three-dimensional space. 

Recall (Section 5.1) that a congruence T: RS —+ R® is defined to be a map- 
ping that preserves distances: dp, q) = (Tip), Tty)). In Seetion 5.1 it was shown 
(Proposition 5.7) that any congruence of R can be factored as reflection in a line, 
4 rotation, and a translation. We will reduce the present case to this one after a 
preliminary transformation. It is first necessary, however, to Verify that an isome- 
try of R® is an affine function. 


Proposition 7.20 An isometry 7: R® — R* is affine. 


Proof First notice that T must carry lines to lines. This is because three points 
p.q-r are collinear, with q between p and r say. if and only if 


per) = d peg) + dg.r) 


494° Bigenvalues and Eigenvectors 


FIGURE 7.8 


Since Tis a congruence, we have 


AT p) TH) = AT p), Ty) + AT). TH) 


and so T(p), Tq), TH) are collinear with 7(q) between Tip) and Tir). 

Now we can use the fact that T takes lines to lines to show that it also takes. 
planes to planes. The precise statement we need is that if the four points p. q. r. 
yare coplanar (but no three of them are collinear), then the four points T(p), Ti). 
1), and T(9) are also coplanar (Figure 7.8) 

The four points may always be labeled so that the line through sand q cuts 
ne through p and rin a fifth point ¢ (you should convince yourself of this). 

Now since p, q, and rare not collinear, the above reasoning shows that 7(p), 
7iq), and T(r) are not collinear and hence determine a plane = that contains the 
line through 7(p) and T(). Hence = contains 7(Z). It follows that 7 contains the 
line through 7() and 71g) and hence + contains 7(s) as well. 

In particular 7 must carry parallelograms to parallelograms, 
of Proposition 5.7 applies to the current ease as well. 


the 


nd so the proof 


Now we know that our congruence 7: R'— R* can be written 71X) 
b + AN for some vector b and matrix A. Since adding 4 to AN is a translation, we 
‘will restrict Our attention to the linear part of 7. LCN) = AX. and show that 4 can 
be factored as a reflection and a rotation. 

First notice that since the characteristic polynomial 


pid) = det (A — MM) = 8 + 


is a cubic, it must have a real root, This is because 


lim p(X) = — 20 and tim p() = 


noes 


so the graph of p(\) must somewhere cross the axis. 
IF) isa real eigenvalue of A and v is an eigenvector belonging to A, then v 
and Av =v must have the same length, so \ = 
Let u; be a unit eigenvector belonging to a real eigenvalue of A. Let {uy,ty) 
be an orthonormal basis of $+, where Sis the subspace generated by w,. Then 


7.4 Congruences and Affine Transformations in Space 495 


43 


Thumb 


Fingers 
ch 


FIGURE 7.9 


U = (u,|us|us]is orthogonal. By changing the direction of u, if necessary, we ean 
assure that the new coordinate system given by X = UN is right-handed; that is, 
curling the fingers of the right hand from 1, to uy causes the thumb to point in the 
direction of uy (Figure 7.9), 

Isometries preserve angles, and linear transformations earry subspaces to sub- 
spaces. It follows that Z carries S* to S, since it carries § to S, In particular, A 
and Au, are perpendicular to u,. Thus in the new coordinate system we have 
1(X) = A'X’, where 


[Au, |Aug| Aug) 


(21, |Auy| Aug) 


Now A’ = UTAU is orthogonal, so B? = B-'. Thus the 2-by-2 matrix B can be 


thought of as a congruence of R® and by Proposition 5.7 it can be factored as a 
reflection and a rotation. More precisely, Example 5.6 shows that 


i 
either B=R, or B= Ola 


for some angle of rotation #. Thus 4” has one of the following forms: 


496 Figenvalues and Eigenvectors 


1{0 9 


ay 0 
0 


‘This is @ rotation through the angle @ about the x'-axis — that is, @ rotation 
through the angle @ about the line (1) = tu, 


-1{0 0] [-1 0 ofijo o 
Q) A= | 0] =| 0 1 0/0 
o| & oo lol % 
1}0 o][-1 0 0 
=|0 o10 
lo | Be oo} 


‘This 4’ is rotation about the x" axis either followed or preceded by a reflection 
in the y’z’ plane — that 1s, rotation through @ about {(1) = ‘1, either followed or 
preceded by reflection in S* 

In both eases above the sense of the rotation is determined by # right-hand 
rule. Point the thumb in the direction of w,; then the fingers curl in the direction of 
positive 

At first glance it appears that there are two more cases to consider: 


Q) 


100 
0 1 0 
0 Ry i -! | 
1} 0 0 
4) A= 0 1 0 
3} Lo, 11 


These fall under the two cases above, however, In fact by rotating axes in the 
plane of uy and uy the symmetric matrix 


af, —'] 


may be diagonalized (exercise 10). The eigenvalues are +1, so by properly choos- 
ing wy, uy we have 


+10 0 
A=) QO! 
oo -1 


7.4 Congruences and Affine Transformations in Space 497 


Taking A, = 1, we have reflection in the x’y’ plane (case 2 with @ = O.and the axes 
interchanged), and taking \, = —1 we have a 180° rotation about the y’ axis 
(case 1 with @ = 180 and axes interchanged), 

Summarizing the information above gives the next proposition. 


Propostrion 7.21 A congruence T; R? > R® can be factored into at most three 
transformations: a reflection in a plane, a rotation (about an axis perpendicular 
the plane), and a translation. The reflection is present only when the determinant 
of the linear part of Tis . 


EXAMPLE 7.16 Proposition 7,20 states that the reflection can be taken with re- 
spect to a plane perpendicular to the axis of rotation, How is this possible? Sup- 
pose that we reflect in a plane and then rotate about a line in that plane — to be 
Specific, suppose we reflect in the xy plane and then rotate 45° about the « axis, 
How does the decomposition of Proposition 7.20 go in this case? 


Solution Since 


Lip = 
Txeu i 
we have 
vi 0 Off! 0 of 
1X) ot -1{fo 1 ox 
o1 loo -1 
v2 0 0 
Oo. ix=ax 
O41 = 


Notice that 4 falls under case 3 discussed above. The characteristic polyno- 
mial of A is 


pry =| 0 


=(1 —Aa? — 1) 
—(A — IFA +1 


498 Eigenvalues and Eigenvectors 


Thus the eigenvalues of A are I, 1, <1. Ifwe take A, = —1, then the eigenspace is 
the null space of 


AG 0 
Aelia 
vz vz 
1 1 
oo - i-= 
v2 v2 
which has echelon form 
10 0 
Oo 1 V2-1 
0 0 0 


Hence the eigenspace is 0, 1) and a unit vector in this direction is ob- 
tained by taking ¢ Fo compute u, and uy we could apply the 
Gram-Schmidt process to u, (1, 0,0), (0, 1, 0), and (0. 0, 1) or simply observe that 

1,0,0) and u, +1 = ¥2) + V4—2V2 will do. Then, setting 
|p| igh We have 


A’ = UTAU 
0 1-yi 1 i) 
al y= 22 0) 0 1 
v2.4 -2y2 0 Seen sy eh 
0 = va=2v2 0 
1- v2 0 =! 
' 0 -y2 
4-42 0 0 -100 
= 0 4yvi-4 0 o10 
2S) ih 0 4vz-4l Loo. 


Thus 7(X) is a reflection in the plane spanned by 1 and uy, and the rotation does 
not appear (0 = 0). A sketch shows that 7'is a reflection in the plane containing 
the x axis and cutting the yz plane in a line inclined 22.5* to the y axis (Figure 
7.10). = 


IT YX) =, +A 
composition is given by 


nd TX) = by + A,X are two congruences. then the 


Tro T(X 


= Thy) + (AgA)X 


74 Congruences and Affine Transformations in Space 499 


followed by a rota 
(the x axis points 
If A, and A, both involve reflections, then 


det (AA. ees 


det (A,) det (/ 


and so 7; © 7, is a rotation followed by a translation — no reflection is involved 
Similarly if 7,, say, involves a reflection but 7, does not, then Ty 7, involves a 
reflection, whereas if neither 7, nor 7; involves a reflection, then 7, © 7, does not 
involve a reflection. In particular, if T, and 7; are pure rotations [(b, 

0, det (A,) 1,2)} then 7; 0 Iso a pure rotation, Thus any composi- 


‘ations is a (single) pure rotation 


ExAmrLe 7.17. Let 7, be a rotation of 30" about the x axis, 7 a rotation of 45° 
about the y axis, and 7; a rotation of —60° about the = axis. Find the axis and 
angle of rotation of T, eT; © T, 


Solution 1n radians the rotation angles are /6, 7/4, and —7/3, so we need the 
matrices R_,g. R-,4.and R__,,. These can be produced by the function ROT of 
Section 4.2. The matrices of 7,, T,, 7; are given by 


AV ARA- 1D 3 
AN[2 3.2 3}-ROT 030-180 
A2[1 3.1 3JeROT 045-180 
ABIL) 2,1 2}-ROT 060-180 


Ay Ap As 
1.000° 0.000 0.0900 0.707 9000 0.707 0.500 0 866 0 000 
9.000 0.866 0.500 0.900 1000 0.000 6.866 0.500 0 000 
5.000 0500 0.866 0.707 0.000 0.707 0000 0.000 1 900 


and the matrix of T, Ty © T; is 


500 Eigenvalues and Figenvectors 


HALAS KADY XAT 
0.354 0.573 0.739 
-0.612 0.739 0.280 
0.707 0.354 0.612 


The axis of a rotation is the eigenspace belonging to A = 1 — that is, the null 
space of A — J, Row-reducing 4 — J gives 


SELECHELON AWID 3 
1, 0060 4.346719 6.186 2 
0.0060 1000 1.2260 

163E-19 «5.156191, 41 18 


The last column gives the null space, but itis in an inconvenient form. We will 
use the function BACKSUB from Section 5.5 to get a pair of points on this line. 
Since the null space is the solution set of AX =0 or, equivalently, EX = 0, wo 
points (one of which will be the origin) are given by 


4B-1 BACKSUB E,0 
0,061 9.000 
1220 0 000 
1000 000 


and so the vector Bl;1] points along the axis of rotation. To find the angle of 


rotation we can shift to an orthonormal coordinate system with B[;1] pointing 
long the x’ axis, Then the matrix of T, 6 Ty» T, will be of the form 
110 9 
A 0 
Rk, 


where @ = Cos“! A[2; 2}, Recall that the function HSHLDR from Section 5.5 re- 
turns an orthonormal basis of R* extending an orthonormal basis of the column 
space of its argument 


SUCHSHLOR 8 
0.039 0.773 -0 633 
“0.773 0.425 0.471 
0.693 0 471 0.614 


Ae (BU) xAS xu. 


1000 3 09-18 -2 06E 18 
“5.42618 3596-1 9 36E*1 
2.20618 9.3661 3 S9e1 


7.4 Cougruences and Affine Transformations in Space SO 


*THETA=-0~180%-20A[2-2] 
69.4 


Thus 7; © 7; © T, isa rotation of about 70° in the direction that the fingers of 
the right hand curl when the thumb points along B[:1]. = 


Proposition 7.21 immediately leads to a geometric description of an arbitrary 
affine transformation of space, For simplicity we assume that the affine transfor- 
mation has an invertible linear part, although the general case may be included 
by taking a zero stretch factor to indicate a perpendicular projection. The proof of 
Proposition 5.12 easily adapts to give the next result. 


PRoPositiON 7,22 Let f(X) = b + AN be an affine transformation of space with 
A invertible, Then may be viewed as successive applications of 


1. Three stretches with mutually perpendicular stretch lines 
2. A rotation 

3. (Possibly) a reile 
4. A translation 


ion 


The stretch factors are the singular yalues of A, The stretch lines are eigenspaces 
of ATA. The reflection is present if and only if det(A) <0. If the rotation is 
nontrivial, the reflection may be taken in the subspace perpendicular to the axis of 
rotation. = 


EXERCISES 7.4* 
In exercises | through 7 the matrix A of a congruence /(X) = b + AN is given, Decide if 
the reflection and rotation of Proposition 7.21 are present 


Hint: The reflection is present if det (A) = —1, The rotation must be present if the 
matrix has complex cigenvalues (but this is not necessary), 


V3 
1 2 
0 ce i aa 
L 3 
2 
ee as = 2 
ve v6 
23 v3 3+V5 
ad 5 d2v3 =2V3 9 -2v3 
23 348 -3+ V3 
1-2 -2 i @ 2 : 
1 A len 
6 Z[-2 1 -2 ad fe es 


$02 Figenvalues and Kigenvectors 


8, (a) Let Jj bea reflection in the plane 7, Suppose =, and =, are parallel, Show that 
27 isa pure translation 


Hint: Choose # coordinate system such that , is the xp plane, write formulas for 
T, and T;, and compute Tz T, 
{b) Let 7; be a reflection in the plane =,. Suppose that x, and =, are not parallel 
OW that ToT, bs a rotation about the line of intersection of 7, and xy 
Hint: Choose a coordinate system such that the fine of intersection is the 2 axis. x, is 
the vz plane, and m,, cuts the xv plane in a line making an angle of #/2 with the x 
axis, Compute T, 6 7, 
9. Show (hat the determinant of the n-by-# matrix 4 js the product of the eigenvalues. 
Hint: If (A) = det (A — 2), then det (4) = p(0), the constant term of p(A). Now 
factor p(A) into tinear Factors 


0 
10, By diagonalizing the symmetric matrix Ry {i 1 ssw that it represents reflection 


in a tine in RE 
1h Write an APL function 


Name: ANNANG 
Input: The matrix of a pure rotation 
Output: A-veetor F with four components. The vector 3° His a unit vector point= 


ing along the axis of the rotation and P{4] is the signed angle of rotation, 
the positive direction being the direction that the fingers of the right hand 
will eutl when the thumb points in the direction of 37 ¥ 


12, (Computer assignment) Use the function of exercise IL to compute the axes and 
angles of rotation of the rotation matrices among the matrices of exercises | through 7. 
(Use the function DET from Section 7.1 to locate the pure rotations.) 


7.5* Estimating Eigenvalues (Gerschgorin’s Theorem) 


Ifa matrix is diagonal, then the diagonal entries are the eigenvalues. But suppose 
thal the matrix is not diagonal. How close might the diagonal entries be to the 
eigenvalues? 

Each diagonal entry is the center of a disc, the Gerschgorin disc, in the com- 
plex plane. The eigenvalues lie within the Gersehgorin discs. If the radii are 
reasonably small, We may have a useful estimate of the eigenvalues. 


Derintnion 7.7 Let 4 be an n-by-n matrix with (possibly) complex entries. The 


‘th Gerschgorin dise, D,. for A is the disc in the complex plane with center Ali; #] 
and radius 


SD les 


7.5 Estimating Eigenvalues 503 


EXaMpLe 7.18 Let 


The Gerschgorin dises for A are 


D,: center —1, radius 1 
Dz: center 3, radius | 
Ds: center 6, radius § 


These dises are sketched in Figure 7.11. 

Gerschgorin’s theorem (see below) states that the eigenvalues lie within the 
dises. For the matrix A it also states that there is exactly one eigenvalue in euch 
disc. This has two immediate consequences. The three eigenvalues of A are dis: 
linet, so 4 is diagonalizable, Further, since A has real entries, the characteristic 
polynomial of A has real coefficients and hence complex eigenvalues of A would 
‘occur in complex conjugate pairs. If = =x + 1) lies in one of the dises of Figure 
7.11, then so does the complex conjugate =x — /y. It follows that the eigenval- 
ues of Aare real. 


Although the faet that the Gerschgorin dises contain the eigenvalues is not 
difficult to prove, a proof of the sharper statement concerning the distribution of 
eigenvalues among the dises is more difficult and will be omitted. 


a matrix 4 lie in 


Proposttion 7.23 (Gerschgorin’s theorem) The eigenvalues o 
the portion of the complex plane enclosed by the Gerschgorin dises. Further, if k 
of the dises enclose a portion of the plane disjoint from the other dises, then the 
portion of the plane enclosed by the  dises contains exactly k eigenvalues. 


FIGURE 7.11 


304 Eigenvalues and Eigenvectors 


Proof We prove only the first statement. Let \ be an eigenvalue of A and v an 
eigenvector. Suppose that the largest absolute value among the components of & 
is oft}. Now 


Dofi] = Avyfé] = (Ad)fi] = Als] +x 0 = S Ale / lel] 
Hence 


(0 = Als bold) = SAU 


iz 


Now if of/] =0, then v = 0 and cannot be an eigenvector, hence vj} # 0. Divid- 
ing through by of/] and taking absolute values gives 


aati = | » ME ALA < Sales at <n 


fll iz 


since |ol/V/oli]| < 1. Thus A lies in D,, where |oi)| is the largest absolute value in 
b, any eigenvector belonging to. = 


We will now discuss three applications of Gerschgorin’s theorem, all of which 
touch on previous topics. 


ACCURACY OF THE JACOBI ALGORITHM 


In Section 5.2 the Jacobi algorithm for the diagonalization of a symmetric matrix 
wats coded into a function JACOBI. If A is a symmetric matrix and a-Jacos! 4, 
then Q is approximately orthogonal and 


oug=|+ $ 


where the off-diagonal entries are negligible compared to b, the largest absolute 
value in the original matrix A. From Gerschgorin’s theorem we see that if the 
largest off-diagonal absolute value in A’ is ¢, then the eigenvalues of A’ may be 
taken to be the a,, with an error of ” = (n — Ic at most. The error ¢” is close to 
negligible compared to b (unless n is larger than most APL workspaces can han- 
dle). The accuracy with which the numbers a,, approximate eigenvalues of A 
depends upon how closely the machine calculation (wa)+ xa+ xo approximates 
an exact coordinate-change calculation. With the coding of Section 5.2 the differ- 


ence between a, and an eigenvalue of 4 is usually clase to negligible compared 
tod. 


7.5 Estimating Eigenvalues 505 


BOUNDING THE ROOTS OF POLYNOMIALS 


In Section 3.7* the sectioning method of approximating roots of y = f(x) was 
described and coded into a function CENTSECT. Before using CENTSECT. 
however, one must first locate an interval that contains a solution of f(x) = 0. 
If fis a polynomial, Gerschgorin’s theorem gives an interval that contains all 
real roots of f(x). 
We begin with the observation that any polynomial 


PAX) = (dg + a,x + asx? +--+ + a,x" 1)" 


with a, = 1 is det (A — x/) for some matrix A, In fact 


In-l —4n-2 
1 Oo} 
0 1 0 re 

A= ; (7.4) 
0 0 (esa eG) 


This fact is exercise 18 in Exercises 7.2, The matrix 4 is sometimes called a com- 
panion matrix for p(x). 
The Gerschgorin dises for A are 


Dy: center —a,y, radius 7, = S.|a\) 


and, for i > 1, 


Dj center 0, radius r, = 1 


The radius r, is often overlarge. Since A and AT have the same characteristic 


polynomial, we can use AT as well, The Gerschgorin discs for A? are 


Dy: center —ayy, radius r, = 1 


for 1<i<n, 
Dy: center 0, radius r, = 1 + Jay] 
and 
Dy: center 0, radius r, = Jay) 
since the dise D, lies within the dise with center 0 and radius | + |a,__,|. This gives 


us Cauchy's theorem. 


506 Eigenvalues and Eigenvectors 


Proposition 7.24 (Cauchy's theorem) A (possibly complex) root \ of the poly- 
nomial 


PUR) = dy HX Foes Hay RHEE 
must satisfy 
IAL <r = max(lap, 1+ faye. 1 + ay al 
In particular, if \ is real, then —r <A <p 
EXAMPLE 7.19 Let f(x) = x” + x + 1 Then Cauchy’s theorem states that real 
roots of f(x) must lie in the interval [—2, 2}. Since the degree of fis odd, it must 


have at least one real root. Using the function CENTSECT from Section 3.7* and 
the function AT from Example 3.5, we have 


v Z-CFON x 
I) Zr 1 0001aTx 
v 
111E-10 CENTSECT -2 2 
0.755 


So /has a root at approximately —755. = 


STOCHASTIC MATRICES. 


A matrix A is stochasti 


1. The entries of 4 are nonnegative, and 


2. The sum of the entries of each column of 4 is | A.=+¢4). 


Stochastic matrices were discussed in Sections 3.6* and 7.3*. In Section 7.3* it was 
shown that \ = is always an eigenvalue of A, and the question of the 
existence of 


lim a" 


was discussed under the general assumption that the eigenvalues of A are A, = 1. 
Naveess Ay With [Aj <1 for @> 1. Here we will use Gerschgorin’s theorem to 
prove a slightly different result on the existence of the above limit 

First notice that if there is an invertible matrix P such that 


rar [Ho 
_ = ts 
lola (7S) 


7.5 Estimating Eigenvalues $07 


where / is an identity matrix and J" 0 as n — oo, then 


tim a" 


exists. In fact 


In Section 7.3* the case where J is 1 by 1 was discussed. It follows from the 
discussion in Section 7,3* that J” — 0 as hi —» oo if the eigenvalues of J all have 
absolute value less than 1. It can be shown that this is true even if J is not 
diagonalizable, 

We have from Section 7.3* (Proposition 7.18) that \ = 1 is always an eigen- 
value of a stochastic matrix. If for a given stochastic matrix with eigenvalue \ we 
can show that either A = I or |Aj <1, then the decomposition of Equation (7.5) 
will exist and the sequence (4") will converge, The proposition we will prove is 


Proposition 7.25 Let A be a stochastic matrix with Ali; /]>0_ for 
#=1,2,-..,m Then 


B= lima" 


exists. The columns of B are vectors in the eigenspace of A belonging to the 
eigenvalue \ = 1. Further, the entries of the columns of B sum to |. 


Proof First we look at the Gerschgorin disc for AT which has the same eigenval- 
ues as A. Since 1/\.= +74, the ith dise has center Alii] and radius r, = 
1 = Ali;i}, Thus if Ali; /] > 0, the only complex number in the ith dise with 
absolute value equal to 1 is 1 itself (Figure 7.12). Since the eigenvalues lie within 
the area of the complex plane enclosed by the dises, we have {Aj < Land |A| = 1 if 
and only if\ = 1. 

This does not quite finish the proof, since we should verify that the eigenvalue 
) = I cannot appear as part of a Jordan block before we can assert that Equation 
(7.5) holds. We leave this fact as exercise 5. 


808 Eigenvalues and Eigenvectors 


=I 0 aa ft 


FIGURE 7.12 


The equation 1/\,=+/ Bis true, since 1/\.= +7 4" for every n. The columns 
of B are eigenvectors, since 


AB = Alim A* = 


sm ANE 


EXxampce 7.20 The matrix 


So the limit does not exist. The characteristic polynomial of 4 is 


‘ -\’ I 0 
det (A — MI) 0 -\ 1/=-at-1 
ed Ceech 


which has roots \ = 1, (1 + V3i)/2, (1 — V3i)/2, all of which have absolute 
value equal to 1. 


7.5 Estimating Eigenvalues 509 


EXxampte 7.21 The matrix 


5.007 
02.6 0 
4=lo 8 4 0 
5 00 3 


satisfies the hypotheses of Proposition 7.25 and thus 


lim A" = B 


exists, We can approximate B using the function TOTHE from S 
compute Aum. 


ction 3.4 to 


+ Be((A TOTHE 20)TOTHE 20)TOTHE 25 
© 583 0 000 0.000 0 583 
0 000 0.429 0.429 0 000 
© 000 0.571 0.571 0 000 
0.417 0.000 0 000 0 417 


Checking to see if the columns of B are in the null space of A — f, we have 


BAL xB 
0 00€0 0.000 0 000 © 0060 
0.000 8 .67E19 1 30E18 0 00£0 
0.000 B.67E-19 8 67E19 0 00£0, 
1306-18 0 00£0 0 0060 1306 18 


which is quite good (Ocr is about 1€-18 here). The matrix B should be a stochastic 
matrix, but round-off error has accumulated during the 65 multiplications to the 
extent that the columns sums of B do not fuzz to | 


tone 
B.9615 1.94614 1.94614 8 96-15 


In this case the eigenspace of A for the eigenvalue \ = 1 is two-dimensional 


ECHELON AWD 4 


1 000 0. 00£0 0 .00£0 1.4060 
0 000 + 00£0 7 SOE 0. 00£0 
0. 00£0 B.67E-19 2 60E-18 0: 00£0. 
0.000 0. 00£0 0 00E0 8 67E 19 


Proposition 7.19 states that if all components of some power of A are nonzero, 


$10 Eigenvalues and Eigenvectors 


then this eigenspace will be one-dimensional and the columns of B will all be 
equal. The verification is left as exercise 6. = 


EXERCISES 7.5° 


1. Give the centers and radii of the Gerschgorin discs for the following matrices. Can 
you deduce that the eigenvalues are distinct? 


(@ fl 3 2 (b) 
709 


O41 


ane 


2. For the matrix 4 below sketch the Gerschgorin dises and the cigenvalues in the 
complex plane for the given values of 


(Wd) a= V3 


wa 


3, Use Cauchy's theorem to find dises in the complex plane that must contain the roots of 
the polynomial 


(a) fips x81 0b) goed $x¢l ©) Wxt¢e=1 


4. A matrix A is called diagonally dominant if 


Mle A> Steal 
in 


Show that a diagonally dominant matrix is invertible. 
Hint: Show that zero 


not be an eigenvalue, 


5. In the proof of Proposition 7.25 it was stated that a stochastic matrix cannot have an 
associated Jordan block with eigenvalue I unless the block is 1 by 1. This exercise indi- 
cates why this is true, The general fact is sufficiently indicated by a special case. 
(a) Show, by induction, that 
ee ee “nn = 1) 


and hence the 1:3 gomponent goes to infinity al the same rate as n® 


(b) Let J, be the matrix of part (a) and consider 


= Pho 


as n goes to inf 


how that once 1 is 30 large that only the n2/2 term of J,{1; 3] 
needs to be considered, the matrix A cannot have both nonnegative entries and col- 
umns summing to | unless the first column of P or the last row of Q is zero. Hence we 
cannot have both Q = P=! and A stochastic. 


7.6 Linear Differential Equations $11 


6. Let A be a stochastic matrix with Afi; j] > 0 forall j and /. Let v = (vj, 5,....,) be 
an eigenvector of B = AT for \ = | adjusted so that | = max |o,|. Suppose |u,,| = I 
: ul 


Show v,, > Bli: jKv, — v,,) and deduce v, = v,, for all j 

Hint: u, 

Note: This shows that the only eigenvectors of B for \=1 are multiples of 
(1, 1... 1). Thus if B has no Jordan blocks with eigenvalue 1, then \ = 1 isa simple 


cigenvalue. An argument similar to that of exercise 5 will show that B has no such 
Jordan blocks. 


7.6* Linear Differential Equations 


In Section 2.6* the linearization of a nonlinear differential equation at a critical 
point was discussed. The nonlinear equation was approximated at the critical 
point by a linear differential equation. In this section we show how eigenvalue- 
eigenvector calculations may be used to solve a linear differential equation. No 
knowledge of Section 2.6* is assumed. 

We begin with the simplest linear differential equation 


(7.6) 


The general solution of Equation (7.6) is 


y =f) = ce (7.7) 


where cis a constant. This can be checked by substituting /(/) into Equation (7.6). 
The variable r is real, but (7.7) gives the solution of (7,6) even when \ is a complex 
number. If \ =a + bi, where a and b are real numbers, then e may be com- 
puted using Euler's formula 


e+" = eAt(cos bt + isin br) (78) 
A linear autonomous system of differential equations is a system of the form 


dy, 
dt 


Gia + eye Fo + Se 


Be ay, + usta bo Fane 
A : 


where the a,, are constants. 


S12 Eigenvalues and Eigenvectors 


If we introduce the notation 
dy, 
di 


dy 
dt 


and set ¥{i] 
can be written 


. Ali, /] = ay); then a linear autonomous system is a system that 


(7.9) 


where A is a matrix of constants. 
The next proposition is quite easy to check and is left as exercise 9. For more 
general statements, sce Section 2,5*, particularly Proposition 221. 


Proposttion 7.26 Let P be a constant matrix. 


a yep 
thie Beane. 


Suppose now that the matrix A of Equation (7.9) is diagonalizable, say 
A. Then, introducing the change of variable 


¥=PZ 


and using Proposition 7.26, Equation (7.9) becomes 


a pp ? 
© PZ = AP: 
a APZ 
Z = APZ 
AZ 


Equation (7.10) is quite easy to solve. It is the scalar system of equations 


7.6 Linear Differential Equations $13 


where the A, are the eigenvalues of 4. These equations are all of the form (7.6) and 
have the solutions 


AQ=ceM, L<icn 
where the ¢; are constants. The equation 
Y() = PZ) 
now gives the solution in terms of the original variables »,, 
The procedure may be carried out even when A is not diagonalizable if one 


knows the solutions of equations of the form (7.9), when A is a Jordan block (see 
exercise 10). 


EXAMPLE 7.22 Solve 


2 22 
dt Hi 


ad 


Solution In matrix form this is 


d ny) [we reall 
aly, =66 37], 
—40 


So we wish to diagonalize | “(> 


n 
al The characteristic polynomial is 
-40-\ 22 


N24 3A — 28 = (A = 4A + 7) 
-66 37 - 


The cigenspaces corresponding to these eigenvalues are 


» 


A-4l= 


which has echelon form 


S14 Eigenvalues and Eigenvectors 


E 4 
and hence the eigenspace is generated by {i} 
ASoT 
A+T duces t i ] 
+71 row-reduces to | 


and so the eigenspace is generated by [*] 


Any nonzero vector from the appropriate eigenspace will do for a column of 
the coordinate-change matrix, so to avoid fractions we use the variable change 


ate euallalen 


p14 
PAP = eee 


Then 


a 
Notice that we need not carry out this computation; we know that P-!AP will be 


diagonal and we know the diagonal entries. In particular P-! need not be com- 
puted — at least not at this stage, The solution of 


d, [4 0 
alo 2 


Zi) 


and the solution of the original system is 


ake 
Y= PZ) BS te | a 


eye! + Beye" 


In the above example the matrix P-! was not needed, but that is because only 
the general solution of the system of equations was obtained. If one wishes, for 
example, the particular solution for which, say. 


y(0) = 1 
¥0) = 


7.6 Linear Differential Equations $15 


then one must solve the system 


[i]=" (2) 
1 cy 

If Ps larger than 2 by 2, then solving such a system by row reduction will still 
be less work than computing ?-}, 

Linear autonomous systems are more general than they appear. Any homo- 
gencous system of differential equations with constant coefficients may be rewrit- 


ten as such a system, 
For example, consider the differential equation of the form 


PZ(0) 


and hence the original second-order e: 
system 


np. 


or 


Similarly, a third-order equation could be rewritten as a 3-by-3 system, a system 
of two second-order equations could be rewritten as a 4-by-4 system, and so on. 

This trick is used for the next example, which shows how complex numbers 
arise and are disposed of in systems with real coefficients. 


Exampte 7.23 Solve the differential equation of simple harmonic motion 


ay 


ove Say. 
de 


with initial position (0) = | and initial velocity (dy/di(0) = 0 by the methods of 
this section. 


516 Eigenvalues and Eigenvectors 


Solution Set y, =y and y, = dy/d. Then we have the system 


Os = ty, 
dt 


which is @ linear autonomous system with matrix 
[2s 0] 
—w 0 


The characteristic polynomial of A is 


(A + te = fee) 


The matrix 


Co Aieeeeeeiee 
2 he. meee 0 0 


and the matrix 


| | fe fal 
row-reduces to 
=w? ie 0 0 


so the eigenspace of \y = —iw is generated by 
Al 
—iw is generated by 


(2 


the eigenspace of \ 


and setting 


7.6 Linear Differential Equations $17 


gives 


sie 


rarsls 2) 


The general solution of the system (d/di)Z = AZ is 


re 


and so the general solution of the original system is 


2) 


yp yersgs Parlin 
yo = Pz) =| ital | 


weet + weet! 


The general solution of our differential equation involves complex numbers, 
although the original statement of the problem could be stated in a physical 
context where coinplex numbers are meaningless. How does this come about? 

By allowing complex numbers into the calculation we have answered a more 
general mathematical question than the original. We have solved the differential 
‘equation for y under the assumption that »(1) may be a complex number as well as 
real number. The connection of this general answer with the more special origi- 
nal question is this: real initial conditions produce real solutions y(). 

To see how this works in a specific instance, we finish the solution of the 
problem. 

We must determine c), cy so that y,(0) = 1 and y4(0) 


ale! 


0, That is, 


1 p; 
Al = ¥(0) = PHO) = 


and hence 
alee bl-sls “lel=zl-1] 
(|=? [, ~ 2el-w —ilol~ 21- 
and hence 
VU) = y(t) = hos (—ielet — ety 
or 


eit 4 ett 


yn) = 


Using Euler's formula (7.8), we have 


Y(1) = Mcoswr + isin wr + coset — sin et) 


p(t) = cos wt . 


518 Eigenvalues and Eigenvectors 


EXERCISES 7.6" 


In exercises | through 4 find the solutions of the linear autonomous system with the given 
matrix. 


Tents 2) feito est 4. 1 00 Oj 
h HI [1 i] fr: i 0-32 0 
0 3-1 0-43 Oo 

=2 =2 2 =I 


Find the solution that satisfies the given initial conditions in exercises 5 through 8. 


sy 20,10) =0, Mo =1 

6 O42 46 = 0:0) =1 20 =0 

7, 2 4 <0: 90) =1, 40) =0 

0 0: 0) =2. Boy = -1, eat 
Hint: Set y, = 9. yp = dv/dt, and yy = d2y/de2 


9. Prove Proposition 7.26, 
10, Find the solution of the linear autonomous system 


vip Pb Op 


d 
“ly, = |o 4 
a) | hs 


vd [0 0 Alby 


by writing out the individual equations, solving for y and then “back-substituting.” 


Hint: To solve 


first multiply through by e-™, obtaining 


dy 
ene ety cen, 
era 0) 


or 


or 


ny =e! Peps) ds 


27 The OR Algorithm $19 


7.7 The OR Algorithm 


The basic QR algorithm is as mysterious as it is simple to state, Given a real 
square matrix A, define a sequence of matrices {4,} as follows: 


1. 4, =4. 
2. Given perform a OR factorization (Definition 5.13) with Q square: 
Ay. = OR 

3. A, = RO. 


If A has real eigenvalues, then the sequence (4,) usually converges to an 
upper triangular matrix 


where the eigenvalues of A are Ay. Ag «Aye 

What if 4 has complex eigenvalues? Since 4 is assumed real, the eigenvalues 
of A come in complex conjugate pairs: \ =a + if and =a — if, where a and 
are real, In the case of complex eigenvalues the sequence {4,,) usually con- 
verges to a matrix of the form 


(TN) 


where either B, =A,, a real eigenvalue of 4, or B, is a 2-by-2 matrix with two 
complex conjugate eigenvalues A,, X, that are also eigenvalues of A. 

Before explaining how this comes about we must develop some facts about 
invariant subspaces. 


Derinition 7.8 Let T:R" — R" be a linear transformation. A subspace S of R" 
is an invariant subspace for T if T(S) is contained in S. 


EXAMPLe 7.24 Let T: R" > R® be given by T(X) = AX. 
1 Let § be the null space of 4, Then T(v) =0 for every v in S. Thus 


T(S) = (0), which is a subset of any subspace. In particular 7(S) is contained 
in S. 


520 Eigenvalues and Eigenvectors 


2. Let be an eigenvalue of A and let S be the corresponding eigenspace. If v 

is in S, then T(v) = Av = Xv is also in S (Proposition 4.25) and hence T(S) is 

contained in S. 

3. Let Ay, Ay be two eigenvalues of A and let S be the subspace generated by the 

eigenvectors of \, and 
Ifv isin S, then v 

Tv) = Ala, + a4 


10, + gv» Where v, is an eigenvector of A, and hence 
(@,A\)0, + (@A.)ey Which lies in Sw 


The third example in Examples 7.23 generalizes. The subspace generated by 
a collection of eigenspaces is an invariant subspace. The next proposition gives a 
matrix algebra test for invariant subspaces. 


Proposition 7.27 Let S be the column space of V- S is an invariant subspace for 
1(X) = AX if and only if there is a matrix C such that AV = VC 


Proof Recall that a linear transformation carries subspaces to subspaces (Prop- 
osition 4.23), The subspace 7(S) is the column space of AF; hence the proposition 
follows from Proposition 4.12. 


The next proposition gives the basic mechanism by which the blocks B, of 
matrix (7.11) are obtained. 


Proposition 7.28 Let 7: R" > R" be given by T(X) = AN, and introduce the 


coordinate change X = PX’. Suppose that P = [P, | P,}, where the column space 
of P, is an invariant subspace of 7; Then T(X") = 4’X", where 


B,|* 
Ai = PAP = wu 
o |B, 


Here AP, = P,B,. If, in addition, the column space of P, is also an invariant 
subspace for 7. then 


and AP, = 


Proof \f T(X) = AX, then T(X’) = (P-'AP)N*, Now suppose that 
rene (i 
OQ 


where Q, has & rows. Then 


7.7 The QR Algorithm 521 


roa Rl 


So Q,P, = 1, OsP, = 0, O,P, = 0, QyP, = 1. 
Further, by Proposition 7.27, there is a matrix C such that AP, = P,C, Hence 


Ee | aoe 
oP, | O.?, 


2, Q,AP, | QAP, 
QAP, | QAP, 


AP, | Oo, 


2 
PAP = [ean (A, |APs) = 


Now QAP, = Q,P,C = 0C = 0. Similarly, if the column space of P, can be 
chosen invariant, we can show Q,A 


The next task is to indicate how the complex eigenvalues of # real matrix 
produce invariant subspaces of R". We begin with the next proposition, the proof 
of which is left as exercise 27, 


Proposition 7.29 Let fix) =x" + ay_.x""! + <<» + dy be a polynomial and 
let A be an m-by-m matrix, Let f(A) denote the matrix A" + ay yA") + 

~ + ayl-IfA is an eigenvalue of A with eigenvector v, then /(A) isan eigenvalue 
of f(A) with eigenvector v. 


It follows from Proposition 7.29 that, for diagonalizable matrices at least, f(A) 
and A have exactly the same eigenspaces, A related statement can be proved for 
general matrices. 

Now suppose that A is a diagonalizable real matrix with a pair of complex 
conjugate eigenvalues A, X. If f(x) = (x — Ax — A) = x2 —(\ + Ax +A, 
then f(A) = f(A) = O and the subspace generated by the eigenvectors of A belong- 
ing to and J is the subspace generated by the eigenvectors of /(A) belonging to 
the eigenvalue 0 — that is, the null space of f(A). But A +X and AX are real 
numbers, so (A) is a real matrix and its null space can be computed in real 
arithmetic. This is how the 2-by-2 blocks of (7.11) will arise — via Proposition 
7.28, where the invariant subspace 1s the null space of (A —\/\(A — AJ). 


ExampLe 7.25. Let 


=! =I =I 
A=] 10 0, 
eI a0 


The characteristic polynomial of 4 is det(4 — M) = —(\ + 1)(\® + 1) and the 
matrix has two complex eigenvalues, the roots of A + 1 = 0. Thus the subspace 
generated by the corresponding eigenvectors is contained in the null space of 
A+ 


522 Eigenvalues and Eigenvectors 


The null space of a matrix is the orthogonal complement of the row space. A 
convenient way to compute the desired null spaces here is to apply the function 
PERP of Section 5.5 to the transposes of the matrices involved. (PRP 4 computes 
an orthonormal basis of the orthogonal complement of the column space of 4). 


+PivPERP WA*ID 3 
0.577 
0.577 
0.577 
+P2-PERP (ID 3)rA> «A 
0.000 0.707 
1000 0.000 
0,000 0 707 


Pept p2 
4B. (A+ «PIP 

10060 1. 76E-18 1. S0E~18 

2606-18 1. 02E 18 ~7 07E+ 

434618 1.4160 1236-18 


Up to fuzz (ger is about 1-15 here) this last matrix is 


lol v2 0 
and the 2-by-2 block has eigenvalues =i 


We now return to the consideration of the QR algorithm. Our approach is 
somewhat oblique. We begin with a particular example that illustrates the main 
properties of the algorithm, 

The particular example is a geometric one (Figure 7.13). Given an ellipse and 
its center, find the axes of the ellipse, We assume that the center is at the origin of 
the coordinate system. Then the ellipse has an equation of the form, taking ¥ = 


[ih 


XTAN 


where A is two by two and symmetric, In Figure 7.13 the axes of the ellipse lie 
along the x'y’ axes, the x and y-axes are not shown. The problem is to find the x’, 
y' axes. 

Starting with an initial guess at an axis, Sy, We construct a sequence of lines 
(subspaces) as follows. Given S,, we find f,, the tangent to the ellipse at the point 


7.7 The QR Algorithm 523 


FIGURE 7.13 


at which the ellipse meets S, (there are two such points, but the resulting tangent 
lines are parallel). The next term in the sequence, 5, is then obtained by drop- 
Ping a perpendicular from the origin to 1, 

We will see below that this geometric algorithm is equivalent to the QR algo- 
rithm. 

‘There are two things to notice about the sequence (S,). The first is that if S, 
is the x’ ory’ axis, then S, = S,.,,. The second is that if Sis not the y’ axis, then 
S, converges to the x’ axis, the shorter axis of the ellipse, 

Suppose that the eigenvalues of A are \ > 0 and 4 >0. Then in the x'y’ 
coordinate system the equation of the ellipse is 


AX + uy = 1 


The intersection of the ellipse with the x’ axis occurs at x’ = 1/ VX. Thus (5, } 
converges to the eigenspace of the /arges/ eigenvalue. 


Differentiating this equation gives 


and so if §, meets the ellipse at (xj, y). then the slope of 1, is —Axj,/ju"4. Now. if 
the slope of a line is m, then the slope of a perpendicular line is —1/m and so the 
slope of S,., is uyj/Axj. Thus S, is generated by (xj,.¥%) and generated by 


524 Eigenvalues and Eigenvectors 


the vector 


walle alls) a) 


Now let 7’ be the linear transformation with matrix 
ie °| 
On 
in x'y’ coordinates. Since the coordinate change X = PX“is orthonormal (Section 
4.3), P™ = P-‘and in the xy coordinate system Tis given by T(X) = AX, In terms 


Of the transformation T the description of the sequence of subspaces {S,} be- 
comes 


(1) Guess at Sy 
Q) S,=TS,.) 


(7.14) 


If'we take S; to be generated by a unit vector %. a convenient computational 
algorithm defining a sequence of unit vectors (u,} is 


1. Guess at uy. 


Step 3 is necessary to keep the components of v, from growing too large or too 
small for accurate numerical work. In this form the algorithm applies to any 
n-by-n matrix A and is one version of the Power method of approximating an 
eigenvector belonging to the largest (or dominant) eigenvalue of A. 

We are interested in a different generalization of Figure 7.13, however. If S, is 
4 k-dimensional subspace of R®, then the sequence {S,} of Equations (7.14) usu- 
ally approaches a k-dimensional subspace of R". We will refer to this as the 
k-dimensional power method. 

This is somewhat shaky ground. In what sense can a sequence of subspaces 
approach a subspace? Look again at Figure 7.13. The angle between S,, and the x” 
axis is becoming small. The tangent of this angle is the slope of the line S, and 
Equation (7.13) shows us that the slope of S, goes to zero as n becomes large. In 
fact, if m, is the slope of S,. then the slope of S,,, is 


Thus, beginning with my, the slope of Sy, we have 


aC hea 


m= (Ying oes 


2.7 The OR Algorithm $25 


and since /A <1, m, — 0 as n—> ao. Notice that the rate of convergence 
controlled by the size of the ratio 1/A. 

This shows that the sequence of lines S, approaches the x’ axis in the se 
that the angle between them goes to zero. It works as long as my, the slope of Sy, 
is defined — that is, as long as S), is not the 9" a 

The arguments above extend to the case of an m-by-m matrix A. Arrange the 
eigenvalues of 4 in order of descending magnitude, 


Wy] > Wal > +++ > pl 


IP}Ag| > [Age al then the sequence of subspaces (7.14) approaches a k-dimensional 
subspace containing the eigenspaces of A, Ay... Ay for almost all choices of Sy. 
This is true even if the matrix A is not diagonalizable, 

To implement Equations (7.14) we replace the subspace S, with an m-by-k 
matrix V, whose columns generate 5,. 


Then (7.14) gives a sequence of n-by-k matrices {V/,} defined by 


(1) Guess at Ki, 
(7.15) 


Q) ¥ AV 


If, arranging the eigenvalues \,,...,, of A in order of descending magnitude, we 
have [Ay| > [Axil then for almost all choices of H, the column space of ¥ 
proaches an invariant subspace of A. 

Since we do not know the eigenvalues of 4, we do not kivow how to pick k 
However, looking at V column by column, we have 


Alu, |vs| -+~ [oy] =[Av,|Avy| +++ [Ady] 


This means that (7.15) is actually doing many different power methods at once. 
Looking at individual columns, we have & different one-dimensional power- 


cea 
method sequences. Looking at pairs of columns, we have several (()) in tact) 


two-dimensional power-method sequences. In fact, we have /-dimensional power 
methods going for all / < k. Since we do not know what & to pick, we may as well 
take k =m — 1. Then we will be doing all dimensions at once! In what follows we 
will only try to keep track of one method for each /: the /-dimensional power 
method obtained by using the first / columns. 

Further, since we have no good method for choosing ¥,, we will always make 
the same choice: the first m — 1 columns of the identity matrix. With (his choice 
of ¥;,. Equations (7.15) collapse to 


¥,, = first / columns of A" (7.16) 


Ay.) or Fi happens to be a bad choice will 


Those values of / for which A; 
nce to an /-dimen- 


not give convergence. All other values of / will give conver; 


526 Eigenvalues and Eigenvectors 


sional invariant subspace of A. Suppose we fix an / that gives convergence. Propo- 
sition 7.28 suggests the following procedure. First compute A” for a large n. Next 
introduce a coordinate change X = PX’, where P = [P,|P,] and P, has the same 
column space as V, in Equation (7.16). Then, since , is close to an invariant 
subspace of A, we might expect that 


PAP 


where ¢ is (im — ) by / with entries near zero. Unfortunately, this does not work. 
‘The problem is that an arbitrary coordinate change X = PX’' does not preserve 
dot products. Two points that are “near” in one coordinate system need not be 
“near” in the other. In fact, the most obvious choice for P, P = A*, does not work 
at all, since 


PAP =(A")'AA" = A, 


putting us exactly where we started. 

It can be shown that the problem is avoided by using orthonormal coordinate 
changes, which do preserve dot products. 

Notice that we can get, for each choice of /, an orthonormal basis of the 
column space of the matrix F, of Equation (7.16) by performing a QR factoriza- 
tion of A: 


AY = OR 


since the triangular form of R shows that the first / columns of Q generate the 
same subspace as the first / columns of A” 
Here is the form of the QR algorithm that we will implement. 


QR algorithm Let A be m by m. Fix an integer p and define a sequence of 
matrices {4,} as follows: 


Ay =A 
4,4h_, = OR. « QR factorization with 


1, = QTA,_.0 


a scalar (7.17) 


lar a, is a scaling factor used to adjust the size of the components of A2_y. 
= I we obtain the basic form of the QR algorithm given at the begin- 
ning of the section, In fact, A, = OTA, ,0 = ONOR)O = RO. 


Prorostrion 7.30. Let A be an m-by-m real matrix and suppose that the eigen- 
values Ay, Ay... Aq, OF A have been arranged so that |A, > |\,_,|. Let (4, be a 
sequence of matrices defined by the QR algorithm. Suppose that |A,| > |Ay.4)- 


2.7 The QR Algorithm $27 


Then for almost every 4 the sequence converges to a matrix of the form 
By bi 
ole 


where B is k by k. The eigenvalues of B are \,,.- 
AIS Agee Na: | w 


and the eigenvalues of C 


The phrase “almost every A” may be made precise in a probabilistic sense. If 
‘one chooses matrices A “at random,” then the probability of getting a “bad” 
matrix A is zero. People, however, do not choose matrices “at random.” In fact, 
the probability that a real matrix chosen “at random” will have an integer entry is 
also zero. 

The important point about the set of such “bad” matrices is that itis unstable. 
Consider Figure 7.13. In this context a 2-by-2 matrix 4 will be “bad” if its first 
column is parallel to the y” axis. If A is perturbed ever so slightly, so that the first 
column is no longer precisely parallel to the y’ axis, then the perturbed matrix is 
no longer bad and S,, will converge to the x’ axis, Most of the time, as the compu- 
tations proceed, round-off error will provide such perturbations and, after it slow 
start, the algorithm converges anyway. Here we have a case where the algorithm is 
actually helped by round-olf error! 


APL FUNCTIONS 


The idea is to iterate the QR algorithm until one has 


A= He] (2.18) 


where ¢ is negligible compared to, say, the input data, Then the problem deflates 
to that of finding the eigenvalues of the smaller matrices B and C. This suggests 
that the algorithm proceed by induction on the size of A. Since 2-by-2 matrices 
can be handled by the quadratic formula, it is not hard to start the induction, 

The QR algorithm as given in (7.17) often converges slowly. If |\y| > |Ayeal 
then, as in the 2-by-2 case of Figure 7.15, the rate of convergence of the k= 
dimensional power method is governed by the ratio |A,.,/A,|. The smaller this 
ratio is, the faster the algorithm converges. Typically the algorithm given in (7.17) 
proceeds rapidly at first and then slows down as the problem deflates to small 
matrices with close eigenvalues. In practice the convergence of the algorithm is 
accelerated by the technique of shifting, which will now be described. The tech- 
nique is based on Proposition 7.29. 

As the algorithm (7.17) proceeds, the lower right-hand entry a, = 4,727] 
approaches J,,, the smallest eigenvalue of A. By Proposition 7.29 the eigenvath 
ofA, —a,/ are A, — a,. and, if, is close to ,,, then d,, — a, is small and hence 


528 Eigenvalues and Eigenvectors 


the ratio (Xp, — 44)/Opy_1 — ay) is also small. Replacing 4, by A, ~ a,/ is called 
a shift 

IfA,, is complex, however, and we are using real arithmetic, then clearly a,, 
being real, cannot become close to X,,. In this case the lower-right 2-by-2 block 
B, = 21A, should have eigenvalues j,, 1g approaching A, and A, 

‘Now the characteristic polynomial of B, is 


x! 4 ax + dy 


PalX) = C8 = $y = Hy) 


and by Proposition 7.29 the eigenvalues of p,(A) are 


PrtQi) = AP ob aA + og 
If 1, is sufficiently close to A,, then, by continuity, p,(A,,) is close to p,(yr,) = 0. 
Replacing 4, by p,(4,) is called an implicit double shift because it can be thought 
of as simultaneously performing (Wo complex conjugate single shifis using only 
real arithmetic, It follows from Proposition 7.29 that, for diagonalizable matrices 
at least, p,(A,) and A, have exactly the same eigenspaces as A, A related state- 
ment can be proved for general matrices, 

The shifted QR algorithm is implemented in the function QR listed below. 
We will first briefly describe the purpose of each line, then explain the details, 


TZ OR A iNB.O\K IC 


(1) 2-sTAr a 
[21 -a(pom story io 
[9] (84 28.ay/0 
(a) cr 


[5] LiAv(NQ)+ A+ 44S HHLOR 4 SCPOW(-2 -21A)DBLSHFT 4 
(8) Zz* xo 
[7] (1006-C01) 00 
18] «(Wake (1 SHOMOAS@SHY —1sAyerVIL 
[9} 2e2* x ((NAK)TS ORK KY TAD (ON RN)TS OR(K.K) 2A 
v 


‘The right argument A of OR is the matrix to be diagonalized. The left argu- 
ment S is @ scalar that sets the scale for fuzzy comparisons. Normally we take S 
to be of the same order of magnitude as the lengths of the columns of A. 

‘The function STRT on line | starts the induction if 4 is 1 by | or 2 by 2. It 
returns an orthogonal matrix Q such that O7AQ is triangular — that is, has a zero 
in the 2;1 position, if 4 has real eigenvalues, and returns an identity matrix if A 
has complex eigenvalues. STRT also returns an identity matrix if 4 is not 2 by 2. 
___ Line [2] stops the function if 4 is 2 by 2 and line [3] stops the function if A 
is a zero matrix (compared to S). Line [5] does all the work. The expression 8 
DBLSHFT A computes p(A), where p(A) is the characteristic polynomial of the 2- 
by-2 matrix B. That is, line [5] begins (on the right) with an implicit double shift. 


7.7 The OR Algorithm 529 


The function SCPOW computes a power of A, scaled so that the entries do not 
become too large or too small. The powers are doubled up for efficiency, The 
expression n SCPOW A computes a rescaling of 42". We are using n = 4 and 
hence p = 16 in (7.17). The function HHLDR is discussed in Section 5.5. It uses 
the Householder algorithm to produce the orthogonal matrix Q of (7.17). Finally, 
A, is replaced by 4,. 


In APL, matrix multiplications are cheap compared to Householder redue- 
tions. The value p = 16 gives four matrix multiplications per Householder reduc- 
ns, and 


tion. Too high a value of p wastes computation in too many multiplicat 
too low a value wastes computation in Householder reductions. In FORTR, 
example, matrix multiplications are relatively expensive, and p = 1 is the usual 
choice, 

Line [6] accumulates the coordinate-change matrices Q. The function OR 
returns an orthogonal matrix Q such that QTAQ is in the form (7.11). If OR 
iterates for / steps, computing the successive coordinate-change matrices 
Q), Qx,---. Q), then Z will be the orthogonal matrix Q,Q, +++ Q, 

Line [7] limits the iterations to ten, a rather arbitrary choice, There are matri- 
ces for which this coding of QR will not converge (see below), 

Line [8] detects the presence of the matrix ¢ in (7.18). Here “small” is taken to 
mean “negligible compared to 

Line {9} is the induction step. The results of applying the QR algorithm to the 
matrices B and C of (7.18) are combined with the results of lines [5] and [6] to give 
the final result. 

Now for the details. 

The most puzzling line is probably [8]. To begin with, it contains some primi- 
tive APL functions that we have not had occasion to use previously, 

‘The functions » and (which will be used in coding STR7) are typo- 
graphically and conceptually similar to the transpose ®. 

The symbol © (theta) is ©, backspace, ~ and the symbol ¢ (phi) iso, back- 
space, |. Just as #4 rotates A about a diagonal axis, © rotates 4 about a horizon- 
tal axis and 94 rotates A about a vertical axis, 


A 
i285) 
456 
789 
ea 
789 
456 The order of the rows is reversed 
aes 
eA 
324 
6 5.4 The order of the columns js reversed 
ge7 2 


The sean operator, .. is similar to the reduction operator ». If vis a vector and 


$30. Eigenvalues and Eigenvectors 


‘fis a scalar dyadic function, then /’ v is the vector of partial reductions. Suppose 
that v has 7 components. 


(Avy is vind 
Cfivytay is fever 2y 
(fia) is fiver 231 


(hui is fv 


In particular, if v is a vector of zeros and ones as in line QR[8}. then Av 
consists of ones up to the component of V that contains the first zero, Thereafter 
Av consists of zeros: 


v 

tr10r0rd 
Av 

11100000 


For matrices A one has (fa) (:k} = feat ky and (fay (ke = frariki te 
The action of line [8] is now easily understood by working out just how the 
indicated block of zeros in 


is lagged. 
Line [9] takes the orthogonal matrix Z computed by the loop OR[S 6 7 8} 


together with Z, and Z, computed by S QR Band S OR C[B and C here refer 
to (7.18)] and forms 


Al 
Zi 
Then 
2p 10 0 
49 = | * v, id 
(ue ted) ale a Zz 
_ is 0 
0 oe Z 
fal 
0 lzzex, 


which should be of the form (7.11). 


7.7 The OR Algorithm $31 


To write the function STRT we use the following formula, which is left as 
exercise 28. 


Proposition 7.31 Let 


have real cigenvalues ),, 


If b = 0, set 


e-[t 0] 


If b #0, set O = R,, where 


genie Va=a 
> oT) 
‘Then Q is orthogonal and 
d 
a ‘ 
owo-[} 


In the formula for @ given in Proposition 7.31 we choose the sign of the radical to 
be the sign of a —d. This prevents a loss of accuracy through cancellation 


U ZSTAT A 70:8 


11} 21D 119A 
2] --(4epAn AD /0 
{31 Onde (((TH-/AL1 4}) #2) 44x /A[2 3499/0 


4) 2 

[5] -(SeA[2}9S-Fr (AYO 

[6] ZAOT “30(-T*((T+0)-T<0)mD*~2)-24A(2} 
° 
Let 


[Cal 


Then the characteristic polynomial of B is 


det (B —M) =A* — (a + d)\ + ad — be 


This formula is used in coding DBLSHFT. 


532 Figenvalues and Eigenvectors 


v 2-8 DBLSHFT A I 
U1) 2e((BE Ye = OBIS 1 Yet) PAY WA-CH/t ASB) IID 1 THA 
r 


To write the function SCPOW notice that 4” could be defined inductively as 


itp =0 
ifp> 


v ZR SCPOW A 


tp za 
[2] (P20) 10 

[9] Avast AQ (P-1)SCPOW A 
[4] ZeAe(L AD aed 


y 


The last line rescales the matrix by dividing by the average absolute value of 
ity entries. 


Examete 7.26 Consider the matrix 


2 -2% 10 -18 I 
-l4 13-23 

A= leet et as 1 =2 

7-18 16 -I5 9 

-20 -20 -0 oO tI 


If we apply the function QR, we obta 
10-10 OR A 
V.S9E1 6.611 3.6661 | 5.8aE-1 2 B7EA 
S.A5E) -3.52E 1 1.4661 © 6 BE 1-5 B5E- 
10361 6.6161 4996-1 2 59E-1 5 a8E 
VASE17 8.67619 8 04e-1 2.5961 5 8561 
9.03E-1 4.9762 «7. 1E2 3 28E 12 OTE 
17 (8O) «Ar xO 
23660 -7.44c0 1 /40e1 1 ove: 
2.2060 3 .64e0 4.9420 3 2761 
2 ee 17 2.95637 | 4 # a y4er 
1.21617 3476-18 1.1116 | 2.0060 9 70E0 
VA7E17 8.67618 8.67618 1.73617] 1 00Er 


and A appears to have three real and (wo complex eigenvalues. To “diagonalize™ 


2.7 The OR Algorithm $33 


A we proceed as in Example 7.25 to find bases of the appropriate invariant sub- 
spaces. 


P-(PERPS(2 21T)DBLSHFT A). (PERPGA-T[3;3]x/D 5), PERPSA-T(4-4]/D 5 
=P-P.PERPUA ~ T{S;5]xID 5 


6726-1 6.702 7. 56E-1 -3.896-16 -5 7761 
Q1SE 1 -4.18E-1 3 7BE1 4 08E-1 778-1 
G.71E\ 6706-2 3 78E1 © 4 OBE-1 1 24-17 
V04E17 2.92617 3. 78E-1 98. 16E-1 5.7761 
“4.5362 9.091 -1.03E16 4 816-17 4 88-17 
+De (A+ xP)BP 
41460 2 38e0 188616 1 26616 6.086 16 
7.2760 1860 2.18616 1 56E 1 98E 16 
248616 3 286 17 4 6 936 16 
1.266916 ~7 70617 | 2 0060 } 3 47E 16 
1.52616 2.816 17 8 05616 4 296 16) 1 00F1 


To two significant figures, A has eigenvalues 1, 2, and 10 and the columns of 
P[;3 4 5} are associated eigenvectors. In addition, A has two more eigenvalues, 
which are the same as the eigenvalues of the 2-by-2 matrix D{1 2:1 2} or the 
2-by-2 matrix T[1 2;1 2]. The corresponding invariant subspace is generated by 
P12. 


In the above example it is awkward to actually obtain the complex eigenval- 
ues. It is possible to choose a basis of the invariant subspaces that puts the 2-by-2 
blocks in a more convenient form. In the example we used the function 
DBLSHFT to compute the matrix A? +bA + cf, where M+ bd + cl = 
det (B —A1). Suppose that Sis the null space of this matrix and v # 0 is a vector 
in S. In the case of interest S is (o-dimensional and contains no eigenvectors of 
A. Thus v and w = Av form a basis of S, Further, since A2v + b4v + cv = 0, we 


have 
[ - 
1 -b 
It follows that if we use bases of the form v, dv, then the 2 
the form 
if -c 
1 =b 


and the complex eigenvalues belonging to such a block are the roots of MF + 
B+ c=0. 


Ale |v] = [ew 


1y-2 blocks will have 


534 Eigenvalues and Eigenvectors 


ExaMPLe 7.27 Let A and P be the matrices defined in Example 7.26. Let us 
replace the first two columns of P by v, Av, where v # 0 is in the column space of 


PU: 2. Say v = Pls} 
+PePL A}, (At xPEG NI), 0 24P 
6.711 3.2660 7. 56E"1 “3. B9E~16 5 77E-1 
3191-17560. 78E NA OBEY 5 TTEY 
6 T1E1 3.2660 9.7861 8 0BE) “1 24H 
4046-17 6 28617 3.7861 8.16E) 5 77E-9 
4536-2 6.7560 1.0316 4. 81E17 4 88E“I7 
+D-(A+ xP) BP 
0.0060 «2. S0E1 | -9.13E17 1 S2E-16 8 95E-16 
1.0060 6 o0€0_| 3.00E 17 1 BIE 2 806-17 
0 0060 9 48e 16 | 1 0060 AONE 6 S0E 18 
enoce 2 arete ever re | good] a 21618 
0.0060 «1-026 15-8 09616 4 296 16] 1 O0EF 


Thus, to two significant figures, the complex eigenvalues of A are roots of 
M=6\425=00r\=3+4, = 


The function QR is faster than JACOBI on symmetric matrices. For the 
system used to compute the examples in this book the function QR works three 
times faster than JACOBI on the 6-by-6 Hilbert matrix and six times faster 
than JACOBI on a randomly chosen 10-by-10 symmetric matrix. 


EXaMpLe 7.28. In Example 5.8 the action of JACOBI on the matrix 


1 3 
A=|2 5 
3 


= 


was followed by setting a trace flag. The matrix 4 was diagonalized in eight 
iterations. Let us repeat the experiment for QR. 


TA0R-5 
O1 On A 
Ons) 
11961 1966-7 9 6 7RE-14 
1.96E-7 95.16E-1 4. 8E-4 
6.78611 4086-4 1 71-1 
on{5) 
11361 6.94618 2. 716-18 
1036-24 5.1661 4 08E-4 


4.04628 4 08E-4 = 1.71871 


7.7 The QR Algorithm 538 


+D(50)+ «A+ «0 


1.1361 2556-18 -7. 376-18 
“6. 94E°18 1.71E1 0, 00E0 
0. 000 2.0618 -5 166-1 


‘Two iterations locate the eigenvalue near 11.3, and STRT takes care of the 
fest. Of course one iteration of OR, involving as it does a Householder reduction, 
involves more computation than one iteration of JACOBI. 


Notice that for symmetric matrices A the columns of s af 4 are eigenvectors 
and functions such as PERP are not needed. 


POLYNOMIALS 


The QR algorithm can be used to find the roots of polynomials. This follows from 
the next proposition, which is exercise 18 of Exercises 7.2. 


Proposition 7.32 The polynomial p(A) =(—DMA" +a, AME 
4,\ + dg) is the characteristic polynomial of the matrix 


1 Oye =a =a) 

0 o 0 

0 1 6 
- . 

0 (ae ees 


The multiplier (—1)* in Proposition 7,32 does not affect the eigenvalues and 
will be ignored below, The matrix of Proposition 7,32 [without the multiplier 
(=1)"] will be referred to as the companion matrix of p()). 


Exampte 7.29. Find the roots of p(x) = x — 6x2 + 11x ~6 
Solution ‘The companion matrix is 
6 -11 6 
1 00 
0 10 
Taking S 0, which is roughly the size of the columns of A, we have 
2.10 OF A 
De(H0)+ As 
3.0060 2.55E0. 1: 2681 
“0 00€0 aps 00£0 4 09E0 


8.67619 1.73618] 2 0060 


536 Eigenvalues and Eigenvectors 


Thus, to two digits anyway, the roots of p(A) appear to be \ = 1. 2, 3. We ean 
check this result using the function AT from Example 3.5: 


611-6 1 ar 11490 
5.85617 1.39617 92.7817 


Which is as close as can be expected for this computer system. The exact roots of 
p(N) are indeed 1, 2, 3, as can be easily checked. 


ExampLe 7.30. actor p(x) = 


Solution ‘The companion matrix is 


ooo001 
10000 
01000 
oo1r08 
oo001 8 


+T-(8O)+ AY 0-10 OR A 


‘10060 1 6SE-19 6 SIE 18 «8 G7E 19 1 OSE 18 
173E 18 | 3 09E 1 9 51E1 3.90618 1 Sze 18 
9.52618 | 9 5161 3.0961 1.736 18 2 696 18 
217619 9.20618 1 30618] 8 OVE 1 S$ BET 
152618 3.79619 1 30618! 5 BBE 1 8 OBEY 


We have the obvious root \ = I in the 1:1 position and then two 2-by-2 
blocks. indicating complex roots. Let us “diagonalize” A. 


Pin PERPGA-10 5 
P2-PERPH T{2 3/2 3} DBLSHFT A 
Pa-PERPE ( 2 2¢T)DBLSHFT A 


PLPY.(P2L:1],At xP2[; 11). P9{s1].A* wP3(s 1) 
+0. (A+ xP) EP 


1.9060 | 0 00€0 3.BOEN7 2.41635 2046-17 
3 67E 20 | 0 00€0 1 000 9 00£0 917E 18 
S 996-19 | 1.0060 6 181 | “1 01£-34 ~2.03€-17 
251618 0 00E0 3.636 17| 1.73618 ~1.00€0 
4.43618 0.0060 1.54E-16| 1 0060 1.620 


The blocked-out submatrices (which are variants of companion matrices) 
show that, to three significant figures, 


x*— 1b =(x — 1Xx? — 618x + NOt 4+ 162041) 


7.7 The OR Algorithm $37 


Line [7] of the function OR prevents infinite loops. When could these arise? 
From the discussion of the convergence of the k-dimensional power method it 
appears that whenever a matrix has an eigenvalue of multiplicity 3 or more, an 
infinite loop should result. This is because QR stops only after the problem has 
deflated entirely to 2-by-2 matrices (or zero matrices), In fact, QR will work on 
almost any matrix that is diagonalizable with real eigenvalues. If such a matrix 
has an eigenvalue \ of multiplicity 3, say, then QR will produce a 3-by-3 block 
that has eigenvalues A, A, and \ and is diagonalizable. But such a block is diagonal 
in any coordinate system [P-4A/)P = AI}, and the problem deflates to three 
L-by-I problems. 

Complex eigenvalues of multiplicity two or greater will pose difficulties for 
OR, since it cannot separate a complex eigenvalue from its conjugate. 

If OR stops on line [7] at some level, then the result will be a “triangular” 
matrix Q74Q with a block larger than 2 by 2. Such a block is not too difficult to 
handle, since it is not a complete mystery but is known to have repeated eigenval- 
ues. We will not pursue the subject, however (see exercise 29), 

There are less obvious ways of tricking OR as well (see exerci 
Convergence problems with QR usually become transparent when 
set on lines [5] and [6], 

We close the section with an example of a matrix whose eigenvalues are ill 
conditioned (i, unstable). The behavior exhibited is rather typical of non- 
diagonalizable matrices. 


Hix of (A — 1)? 


Exampie 731 The companion ma = 5A + 10M = 


JOA? + 5A — Lis 


5 -10 10 -5 1 


00 00 

A=|0 3h 20) 80:10 
OL G0) i 0710) 

o 0 0) 11:0 


Since the eigenvalues of A are all of the same magnitude, the QR algorithm, if 
done in exact arithmetic, cannot find an invariant subspace. Setting a trace flag on 
line [5] of QR, we have 


Ts0R-5 
O-1 OF A 

R151 

11360 7.B5E 1 9 55E-1 2.670 6 24E0 From the first 

5.6163 1.0620 1.7160 2.5160 iieo) uesaton 

14GE13 -9.4TES 9.9261 2BSED 7 O5EO A ae 

105613 3.15613 1.76E-3 9.36E1 — 4.84E0 —Jawer left 

S726 15] 4.18619 1.31610 6.0064 8 B7E1 


538 Eigenvalues and Eigenvectors 


ons) 

101e0 7.1461 6.3561 2.18€0. “6 860 

“6.7465 101€0 © -1.69£0 © 2. S0E0 © 9. S90 

5.34615 3.79695 9 996-1 -2.95€0 7 74E0 

152-18 “6 146-15 1.966-5 9.9961 5.2460 

y. 26 22 3.666 19] 1.816 14 6 726-6 9 88E 1 

QR( 5) 

10060 7.0861 -6.026-1 “2.1960 -6.70€0 

7.9087 1.0060 1.69602. S0E0. «0 

D.1SE19 ~4 406-7 1.0060 © 2.96€0 ©7820 

1006-23 “8.69619 2276-7 9.9961 5 2980 

S766 26 221629 2.666181 7.7468 9.996 

QR(5) 

10060 7.0761 ~5. 996-1 2.1260 6 G9E0 

B.s06-8 1.0060 1.6960 2.5060 “8840 

2806-22 2.30611 1.0060 -2.96€0 7 82e0 

106-25 9.37616 5.596-8 1.0060 5.2960 

130628 1 77E8a 7.4dE 181 1.566°11 1.0060 

OR(S) 

1 000 7O7E-1 “5. 98E-1 2.1260 66960 The S-by-5 

7 646-8 1 00E0 1.6960 2.5060 9 64E0 problem 

2.756 24 6 OSE 14 9 99E 1 2.9660 7 e260 deflates to a 

2.73628 1 026-17 -1.296-7 1 0060s 28e0 

1436-37 4 50E-34 1 076-26 1 526-23] 1.00€0 

ons) 

10060 7 07E-) 5 98E-1 2 120 The 4-by-4 

Lose? 1 6960-2. 50E0 detlates to two 

6.196 30 9.9961 2 96e0 Ponzi 

B.14e-33 1208-30! 1 296-7 1 o0€0 ay met 
The resulting block-triangular form is DUSTRY: 

+Te(HQ) + xAv x0 

1006 7076-1 “5 986-1 “2 1260 6.680 

st.oae'7 10060 _-1.69€0 -2.50€0.-. 6420 

Vo4e-17 1.79618] 9.9961 2.9660 7.8260 

7596-18 2.606-18|-1 296-7 10060 5 2960 

\41E18 4346-19 8 e7e-19 0 00€0] 1 000 


We have two blocks, indicating complex eigenvalues and a real eigenvalue 
near I. 

The matrix 4 has the eigenvalue I with multiplicity 5. The matrix Q is orthog- 
onal to about seventeen digits. 


(1D 5)-(RQ)+. x0 
0. 00E0 217E-18 -3.04€-18 -6 296-18 -1 95E-18 
217618 0.0060 2606-18 -1 526-18 -9.76E-19 


27 The OR Algorithm $39 


“3.04618 -2.60E-18 1 73618 1 73618 1.08E-18 
6.29618 -1.52E-18 ~1.73E-18 -3.47E-18 -1. 146-18 
1956-18 -9.76E°19 1 OBE-18 -1.14€-18 8 67E~18 


How near to | is the real eigenvalue of T? 


)DIGITS 18 
Was 3 

TIS;5] 
0 999606805561)783 


So 7[5;5] differs from 1 in the fourth digit. Similarly, computation shows that 
the complex eigenvalues of T are of the form 1 + z, where = is a complex number 
with |z}~ 105, 


The function QR produced a result so quickly because inexact arithmetic 
quickly perturbed 4 to a diagonalizable matrix. The eigenvalues of A are ill condi- 
tioned because a perturbation in the eighteenth digit of Q produces a change in 
the fourth digit of the eigenvalues. 


EXERCISES 7.7 


(Computer assignment) In exercises 1 through 12 use the function QR to find the eigenval- 
tues of the given matrix. (You may wish to use the function CXEJG of exercise 23 40 
compute complex eigenvalues.) 


1 o-t 2. 
aes 
2 0 
an A 
3. a as 4 
eer 
10 6 
eS 3p 
apy Si 6 
42 
6.2 
ao 
7. Bh ca Zia 12) & 
—4 -3 -2 -2 
—4 =2 -3 -2 
aces) 
ee = 10, 
3-2 1 =4 
ee ae 
8 0 —4 13 


S40 Eigenvalues and Eigenvectors 


hi fat <2: 322 2 fl 0.4 =2 
4 3) 410 230 -2 
2 4 UW -8 OMB) r=8 
2 4 2 -5 244 -3 


(Computer assignment) In exercises 13 through 20 factor the given polynomial into linear 
and quadratic factors with real coefficients. (You may wish to use the function CXCOEF 
‘of exercise 22 to compute the coefficients of the quadratic factors.) 
13, xt xt Ith +6 
14, xt xt ant pag? 
15, x9 — 15x44 85x — 25x? + 274 — 120 
16, xt = 9x! 4 45x — 87x +50 
17. xt = Bx + 4208 — BOx + 125 
18, yt — 2x4 ft + 2x 2 
19, — dx — x4 + 30K = 66x% + 64x — 24 
20, x? — 6x! 4 2309 + Bx? — 26x — 100 
21, Let B be u 2-by-2 block with complex eigenvalues. Show that if P{:1 
Pl:2| = BPl;1] then PRP 
mial of B 
Hint: See the discussion preceding Example 7.26 
22, Use the result of exercise 21 to write an APL function 


{1 Off and 


li Tp | where x? + bx + cis the characteristic polyno 


Name: CXCOEF 
Input; A 2-by-2 matrix B with complex eigenvalues 
Result: The vector (b, ¢) where x? + hx + ¢ = det(B — xl) 


23, Use the result of exercise 21 to write an APL function 


Name: CXEIG 
Input: A 2-by-2 matrix B with complex eigenvalues 
Result: The vector (a, A) where the eigenvalues of B are a * i 


24, (Computer assignment) 


-1 -2 -2 0 
—4 1 -2 -2 
4) <0) 3: a . 
-4 4 0 -3 


(a) Show that 10 OR A.xeturns an identity matrix. 
(b) Define a “random” orthogonal matrix @ by 


Q.100 HHLDR? 4 4»100 


Show that 10 Of (80)+.A+ x0 is upper triangular. 


7.7 The QR Algorithm S41 


(©) Explain why the function QR works on OTAQ but not on A 


Hint: Compute 4 scrow(-21)oBLsHFT A “by hand” (ic., in calculator 
mode). 


0 
25, LetA = (iH) where / is an identity matrix of size 2 by 2 or greater. Show that the 
matrix Q computed on line {5} of QR is an identity matrix and hence line [5] does not 
change A at all. 


26. Let ¥, W be n-by-k matrices of rank A. Show that Vand W have the same column 
space if and only if V = WC, where C is k by & and invertible 


Hint: Propositions 4.12, 2.9, and 2.5. 
27. Prove Proposition 7.29. 

Hint: Exercises 11, 12, 13, 14 of E 
28. Prove Proposition 7.31 

Hint: Compute the roots of det (A ~ 1) obtaining Ay. Ay. Then row-reduce A — )/ 


29. Let A be a 2m-by-2m matrix with cigenvalues a + fli, each repeated m times 
Let 1 be the sum of the diagonal elements of A and let 5 = det (1), 


(a) Show that det (A — 7) = A2" — A) eo $8) 
Hint; Elaborate on the proof of Proposition 7.12 


ercises 7.2. 


(b) Show that, since det (A — AZ) = [A = (a AOIMA = (a = BOI. 


w= 5h and a + f= 8m 


30, Sketch the geometric algorithm (Figure 7.16) in the case where the ellipse is 
replaced by a hyperbola, 
Note: The asymptotes must be assumed tangent to the hyperbola (tangents 
at infinity) and both the hyperbolas Mx’)? ~ pty)? must be drawn in. 
What happens with the rectangular hyperbola (x = +1? Explain, 


“Vy 


Answers to Selected Exercises 543 


APPENDIX A 29. For Univac 1100 series machines, n = 49, n = 50 is slightly worse. 
“ Ke(4100)-100 
3 ff 8dx =} (47K+2)-100 


Answers to se 


bs Ke0(4100)-100 
33. ff sinxade = (4/10K) x0=100 


Selected Exercises s ue 


2 (1100) 
35. f cos3esin Seay =o MORR(s1002-100 


: J, (+/(203%K) x105xK )x0-50 

= Answer machine dependent — very close to zero. 
37. 224456689 3. 35-35-35 
4-62-6456 -6 B03 


45. v.wis (a, 10q2 Bye Bigs s+ Bags Which has n+ m oF (pV) +oW components, 


) 


A. iv .Wisay bay ton ty + Bit Be to + Ay or (Sa) + (SP 


EXERCISES LI * pte if Visa vector (these are vectors ) 
Nueyacae ON Saat) “Sipesaneml sa (9!, eeabes reX\0 if Visa scalar \with one component 
I, 793*-2 13, 02180 (Notice that this is monadic =.) 15. 100842 $1. 46 Pn) 5 6 $2 8 a6 59) con (sin 2x) 
17, 1¥<(209)+2 (Notice that seed = 1/c0s.) 19. ((504)+2)=(605)+3 
2 1093 23. (roty=-Conyerd 25. exxy 2% ((XeR)eVeR DEW? 60. (a) Years = 1926 + 2534 steel 
29, BxH-2 My (OHKAS2)~3 33. (1 4Xe2) oH? 61. The result is given in Equations (1.4) 
BS, (eXe.2)0t-4e2 97, B39. = 9 8S, é 
4c 4, 0 Sh ot $33 55. | 576 5% 1 OL ac/b erLIOoNces) <8) 
6. z—b+cetd 64 ab—c) 65. x" 67. tan(Tantx) = x 78.5 
69. 1 + e% (The leftmost '+' is monadic) 71, Hint: Which gives the 65. Kear 
largest answer on the computer: (+01) of (o1)ee1” 73, «475. 0 AM MU JME) 
Tei 9 it 203 202 199 236 175 160 146 
67. (a) Kear 
EXERCISES 1.2 {MILAGE ~ AM.G{ 19K} 
13.9 15.5 149 15.8 12.7 11 10.7 
te Spats 3. (-.-") S103050 where AM is the answer to exercise 65 
1000 9% 5-922 I oorz004aa 
13, (49,832, —49,832, 49.832, —49.832) Recall that log, (b*) = x. EXERCISES 1.3 
5. + Vexe 24 = 
Nee ASS ERR eee ze La 342 ced T3081) 4 
I. ey “1-1 3 -2exe 14 10.2 1 0 Ans. = ~.0367, 0, 0210 ae ra sea, Su 3 
19. (6x1 0 1)+9x0.1 0 1 00 0 0 
636 aa on A 
(6x1, (243).1)47 2 322 fai & 
2h 6s 57.5 te Sear u 3. 92 5. 1 203 
2. +130 2s e1100 2. 121420, 120 er. 234 2 5 eae 
465 +eneKe2 “et 22165 8 Aba at 
338500 tant as 612 
a 45 64 


542 


S44 Appendix A 


16, 


2. 


29. 


TE 


3. 


37 


4. 


4s. 


(a) Size is2 2 (c) Size is 2. 


(i) Size is 1 1. 


4 
2 
2 
1 


i423 
ae daa 


(e) The z-scores for CON are 
=140 =130 —1.00 
100.9070 
9 9039 
HIS) 925) 86: 
0S 02 —46 
-42 -S1 16 
-a 0-38 
43057) =.07 
=1.70 =1.60 
No 65 = 
26 100 —.84 
170 2.00 1,00 
AiG) S880 te 77 
Lo =41 2.70 
=130 =1.30 =.83 


bevto 
des 20 
vitite 

=1000 


Fede 110) 6 
4/4/01) *. «Bou 
~ 933013 
feJeo(430)-30 
Ae (0-30) +2 
Axrer/ Ds tod 
10.2 
IeJn0( 125) -25x2 
AAW (0+50) +2 
BAxt/+/301 xed 
-0.521 


(b) AlAs i] = 0 if and only if k|/ is zero — that 


(k) Size is .1. 


19. 


te) 
(m) 


Size is1 2. (g) Size is 10 1. 
Size is 2 2. 
2 12 ZL 1s 2. 4% 
ze Pa: 
2s c+ Ree er Ma Ti | 
12 ah tee a 
rs 
24 
66 on 
as -92 
110.33 
-270 -1.40 
74 1.30 
7 150 
—60  —57 
-95 -1.30 
9% 61 
—58 05 
82 42 
32-47 
=100 1.20 
=37 =1.60 
37-15 
3s Jerr 
Jni20 
See CInRy sede 
36400 
39 fedeonensy 
eres tecadsy 
186.093 
43 JeJe0(430)-30 


BAx(0+30) 92 
BAx+/a(201)*.xt0d 
0.208 


is, if and only if k divides 


Answers to Selected Exercises 545 


EXERCISES 2.1 
L @ 0.4) (©) (0.18) (©) (6a) 
(8) (—2, -6, =4) Notice that 3 1p, 1=pv and dz pw. 
2 eT] 100 
() L-1 7 @ for i (f) Not defined 
001 
ey Tae =A 2 
et hart) 8 00 
| Viciaey eae Oke g |; 0 
2 etna Li 
ai 43 egies 
Ia 31 hs al © Jo 00 
peas) 
’ c 
S. Suppose thara =[% 7] and #=[} j].araz = Ba.then[? ‘l=[5 ole 


a =d and ¢ =0. Another properly chosen B will show 6 = 0, 


31 =47fK) [7 
7. (a) : ‘lG=f] @ jor afixj=fi 
% y 40 O} ly, i) 
3 L)fx 0) 7 =1 Olfw 
() ‘|= [3a (gs) [12 “[- o+tt 6 i | 
, oo4 Ue loo alle 
9% (a) 12 (© 14 (a2 yd (Dt 
&) 1 im 1 (@) 010 @ 2 2 
10, (a) AK. 00 (c) Ff 00 2A 
11. For example, (ATA): 1] = Sy AMM: AJALA: 1] = Sy AlAs 1]! = 0 shows AlAs I] = 0 
for all k 


EXERCISES 2.2 


1 rfl Sf Slowo= 


3. (aly + Blyd =aL,A + BlgA =al + fl (a+ By =l 

5. (a) (BAA AB) = BAAAB = BIB = BB = 1 shows (AB)! = BoA"! 
by remarks following Proposition 2.5. 
(b) Choose 2-by-2 matrices at random until you find a pair that works. Use Propo- 
sition 27 to compute the answers. 

7. AA =I shows A = (A!) by remarks following Proposition 2.5, 
A(R, ~ Bz) = AR, — AR, = 1-1 =0 

I. (by fd, 


| then X-K=1N gives a nonzero solution of AN 


546 Appendix A 


12. (b) (1,4) (a) (0.0) 
13, (a) 13 (c) 14 Ce) v2 
15. (a) No leftinverse (c) 0 007 0 019 0 086 0 024 
0.076 0 08 0 431 0 130 
0.022 0.018 0.102 0 027 
(e) No left inverse 
16, (b) No solution (d) (919, ~1.14,2.52, -1.77) (fF) (.661, .188, 248) 
17, (a) 6496-3 -8,91E-3 (ce) ~0.006 “0.006 -0 014 
5.05E-3 7 76E-3 0.003 -0.009 0 003 
5.2564 9 496 6 0.001 -0 010 -0 008 
9 002 0 000 -0.004 
(e) 0.007 0.006 ~0.011 0 001 
0.001 -0.007 0.003 -0 003 
© 002 0 008 -0 015 0 008 
0,004 0,006 ~0 006 -o.003 
0.000 0.006 0.000 -0 003 
18 (2) (B+ C)DMi:/ 


= DUB + On AIDA: 


= SB: ky + Cli: AD DIK: /] " 
+ 


Answers to Selected Exercises 547 


12 + 1.8y ~ 832 + 08903 


y= B2~ 264 


y= 82 ~ 26, ~ 0001342 


The line »: = $2 ~.26x and the parabola y = 82 ~ .26x ~ .00013x# eannot be diss 
Linguished on the plot above. 


San 7 13. Age = -4.21 + 218 height + 0224 weight 
=D AURIDUG) +S, CADIZ) Height = 15 + 1.6 age + .387 weight 
Weight = 2: 7 06 hei 
= (BD); /| + (CDM /} ‘5 ee ee ae ere 
(4) (a + BAM) = (0 + BAUS /) = otis} + Bali) , b = = 1.091 (see plot on next page) 
= (aA) + (BANE /) 
= (0A + BANS) 
EXERCISES 2.4 
face 1 4-12 
= ; B=[']as 
EXERCISES 23 1 tinear a= [5 3] 3. amine 8 =[3].4=[5 ~'7] 
) 
0 lo 0 10 . 
2 (b X=(C-B)U=|2 -1 1] (@) Nosolution 5. Affine, B=[I,4=[2 3 4] 7. Quadratic form, 4 i 
3 32 0014 0 
(0 0 (hy) X= HW =(-3 -10) 
ae | 
3. (a) ZA = BL, — L,)A = BLA — BLA =B—B=0 9. Quadratic, e=4,8=[9 12 -I,A=| 9 0 
(6) Take the matrix constructed in (b) and cither delete rows or add rows of zeros ae 
Until it is square. t 
5. (a) (2,3) is a solution I. Affine, B=|1],4=1 13. Linear, 4 = 
(c) 1 164 is nota solution. ' 
(©) (1, ~2,1) isa solution. fonoenee Irie peo are PS TT 
6. (b) (1.2, 313, .702) is not a solution. (d) (1,1, -1, 1) is @ solution. . 
9. 3942 19. Linear, A=&3 129 1 © 0 2003004 00 0 


Otherwise 23. Quadratic, ¢ = 1, B= (1-17), A 


548 Appendix A 
tppen Answers to Selected Exercises 549 


1s 
4 3. (~sin t,c08 1,1) (using (» Df) = (oY), »X, we see that Df'is a vector). 
B 7. Df(X) = 3xXe2 % (4 8 Df) =2oX; Dfis/\=0 it Z/ 
2 a : fen _ fen ifi= 
i eS aael a Beh fen fi =] 
ay tt = 857 Be (DCAMEN =F = 19 otherwise 
nN © = data points 


1S. (Q2X+.+2)aX 17. (0,0), neither max nor min 

19. (0,0), second-derivative test fails. Neither max nor min, since /(x, 0) changes sign at 
(0,0), 

21. _X =0 is critical point, second-derivative test fails. Neither max nor min, since a 
component of /(X) changes sign near 0. 

23. Neither max nor min 


23. Dia Nis f= ah 


= 1.087101 
sl» = 087x109 


27. g:R" —» R* by g(X) ene! ) so 


Ox, “Ox, 
1 
oy 
3 (DYN: | = (ONE = AO. 
7 y= 5.25¢~ 92558 
1 
ies EXERCISES 2.6* 
1 he Sit: s 6 Fi a 4 
= 24s 
(arph for exec 15 of Ererees 23) LAL 0,0), 4X) =0. Ar(1.0, aX) =[~5]+[3 _3][}] 
4 3. a(X) =B + AX 
25, Linear, A= [ff] 27. Linear, A = M7 +N sare 
rr . p = 0 gives a(t) = [0] + |t 
29, f(X) = A,X, g(X) = AgX implies g*f(X) = (AgA YX Soy es collane wocieem mse ene 1 (G 
3. SUX) = B+ AX, £00) = 6 + DX + XTEX with E = ET gives 1 0 
g°/(X) = g(B) + (DA + BTEA)X + NTUATEA)X ip mia ieives m0) | aac 
(Notice that XTATEB is symmetric.) 7. Curve is unit circle. Tangent is x + y = V2 
33. (@) N=LY-1B 9. Curve is helix winding counterclockwise about 2 axis. Tangent is intersection of 
(©) g(/(X)) =D + CB + CAN =X. Taking X=0 gives D+CB=0. So planes x = 1 and y 
g°f(X) = CAX = X = 1X, which shows CA since the matrix of a linear trans- HX, = (20, 15) 


formation is uniquely determined. 


13. fe 
35, (a) (aA) j] = (@AN Ji] = AL jsf] = 0AM) 


tan.x — x, Df(x) = tan? (x), X, = .7702 {tan (1) = 1.544} 


15, f(x.y.z.w) = (x? = y? + 2? — w? — 4, sin (aye), xp EEE 
(b) (A + AT)" = A + ATT = MAT + A) x +ylinzw — 4), 
7 2x -y 2 =2w 
EOS Df= | yew cos(xyzs) xzweos(xyzw) xpw €08 (x xyecos (yyw) 
we i 1 1 1 1 
L 0) [ee oa i tbe Pye i 


x, = (5.65, .991, “4.75, ~1.89) (use 8), 


550 Appendix A 


17. Let w = dz/di. Critical point is (2, w) = (1.0). At (1,0) 


ml=[ alle] 
a: ae 


EXERCISES 3.1 


LL Wert x 3 vyera x 
[1p ¥et4x [1}¥e+e 1 28 0x 
v . 
UYAFT x 9 VEX FO Y 
(1]¥e (S00 74-49 0x) 02 [1 )Z-10+(x#2)¢20¥ 
v ° 
12, (b) FAL TABLE H 13, STATRIGTABLE THA 
U1 JAeBe xo? [1]Anat 2 3+.0 7H 
° [2] Toa. oA 
. 
14. (b) VAL AREAS M 
[1]SeMe <2 
[21K (4) (C4, pS) 9S) 0M) v9? 
v 


1S. SPAN DEGFIT M 
CUDPeM(: 210ME 51) 20, 0W 
v 
17. yZ-SIN GY 
[1] Y-20800 cHoP 0.x 
[2] Z(Xx(1, (4999 4 2),1)+-x¥)=1800 
v 


3. WYeFS x 
[1p Yernxen 
v 
Me ezex Fat, 
E200 Y) 1 
v 


The worst approximation is for x = =/2. On a UNIVAC machine carrying about 18 digits 


the value of SIN 0-2 is correct to 13 digits. 


EXERCISES 3.2 


tooo 1 -}0 0 

[os oo 3/2 100 
=3.0 10 “Jo -3 10 
0001 ety Ti 

7. Yes, Els 3] 9. No 

13, Yes, Elo] (empty set) 15. No 


soo- 


IL Yes, £[:3] 


Answers to Selected Exercises 


21, 


Soom 
Sao 


es 


S51 


In exercises 26 through 35 different row-reductions may give different augmentation col- 


100 1 oo 
7 jo 10 1% |O -1 0 
004 o o41 
igi ea eat 
23. Jo 1 -3 25. 
Cay yi eed 
o -10 
umns. 
10 0)2 
27. (E|FL=]0 1 1) 3]. no solution 
0 0 O14 
100 2)5 
2. KEIFI=|)) 4 | 4 | f-n0 solution 
lo 0 0 ols 
120 2/5 
MLEIFT=1) Of g| sf ne solution 
0 0 0 O10 
10 0/2 
jo 1 0f3 
33. (E1F}=|0 0 1] 4], no solution 
0 0 os 
0 0 O16. 
102460/8 8 
lo 13s 70l9 9 
. 0 0000 1/0] Jo 
ae FIFI=1) 9 0 0 0 0 of *= 0 
0 0000 0) 0 0 
0 0 0 0 0 0! 0 10. 
37. No inverse 
39. One left inverse is 4). 
$2.0 -1 
7230 -1 
8°3 3 1 =L 
—-7 -2 3 =! 
0-1 0 2 
ee 4 
43. A ewer et IS. 
1 


2 = = 
= =s =) 
Shall ral || Eee 
'} of*@l f+] o 
0 0 1 
o 0. 0 

One eft inverse is 


bi ot A 


552. Appendix A 


10 1200 0 
10 586 0 —767 1.21 A ni naee coicote eo 
47. |o 1 295 0 439 oe ae Ret ESA 
oo 0 1 165 aro Ge CoE ool 
sl 3 33-2 4 5 55. 5x19 Cech 
a8 
39.3 6 17 6 0 o»0 65, 3 ap 20 
6 
® 
EXERCISES 33 
1. The echelon form of [vy |Ye| | e| |] is 
Weep 257 
bain Singir7ig) 
3. The echelon form of [uj [031 0} ala) Is 
G00) feat 
(het weet 
ye th tie 
5, The echelon form of [|| ||| Wal is 
1000-2 0 -2 
O100 3-1 4 
Bios 0; ae aes 
pmo 7 iF 
7. rank = 2 9. rank =3 
For exercises 11 through 15 the echelon forms of {v,}0g| «+ [u,) are 
eo) sey 
Tigoyeean ak oes 
Tey Repeal ets |[So0 ae neue! 57] Oe 2 
a. oo 0 O14 Laden Pe 
00 0 0 


17. (a) Equations 1, 3, and 5 are irredundant, 
(b) 2 =2, k =3, 90 no solution. 


19. (a) Equations | and 2 are irredundant. 
(b) n= 3, k =2, one arbitrary parameter. 


22. IfG,A = E = G,B, then B = G;'G,A,s0 F = GG, If B = FA, Finvertible, then 
Fis a product of elementary matrices. So E = GB = GFA. Show A and B both reduce 
WE 


Answers to Selected Exercises 553 


EXERCISES 3.4 


7 


19. 


21 


23. 


TzF x 3 veer x 
(ex (1 z-t9x 
(2019/0 [21-(x<-1/0 
(3 ]2-x02 (3)26x-1 
v (4-001) /0 
15]2--00x 
v 
ZF x T vzex Fy: 
(zx [1 )zexey 
[2}-(0n216x)/0 [2)-032(X. 14 XC) & 2 203-3 -3 4) /0 
[3)2-x [312-0 
a v 
Yes 1. Yes 13. Yes 15. Yes 


a = 9999, b = 10,001 + 1/9999; no, 


vA N 
11) 2.2 
[2] (any /0 
[3] 2eF Not 
(4) 22,4 26 zpneayes 2 
Fo 
2 10 210 88410 15630000000 4.888620 4.77864) 4 566E83 
96167 


vZEN 
ty zea 
(2) (rN) /0 
13) ef Net 
14) 2-2, 21N-1]+20Z1N-1] 
v 
FS 
101.84 4.871 1.571 1.579 


vEN 
Oy za4 
[21 (22) 10 
(3) ZF Nt 
[4] Zz, 1t021Z 
v 
F 20 
1 1 5 9 29 65 161 441 1165 2929 7589 19310 49660 - 
126900 325500 833000 2135000 5467000 14010000 
3seg000 


$54 Appendix A 


25. UN 
1) 4123 
(2) (32) /0 
(3) 2 Nt 
(4) 242,40 -14.x(-9tZ)93 1 2 
' 
Fe 
1 2 3 -8 ~56 ~3109 9666000 -9 344e13  -6.73127 
7629655 5 81TE111 3 376E229 


EXERCISES 3,5 


3 f x] [08682 
1 i -| | + [2] 3. No solution 5. /8l= a 
: % ‘8169 


= aes 0 333 


1.1670 0 —.1667 9 No left inverse 


EXERCISES 3.6* 
ty 4 =0.D.a=04=[} al 


sae} 


3 =i 


3 a= [vy ~ ple ence at = (ANY = Land Oya = AM, = Uy 
Oo -1 1 
7. vy =(1,2.3,@=(2,0,0,4=]1 0 0), hence 
o 10 
=) d 2 -3 3 
waar an «| 3 +2 { = 
1 12 0 3 + 
9 05 05 424 303.273 
9. d=].0s 75 al and A ne =| 303.273 
10.20. 70! 424 3 273 


and so the long-term distribution of fleas is 42 percent on color 1, 30 percent on color 
2, and 27 percent on color 3 


EXERCISES 3.7* 


I WZ-€ CENTSecT 1 3. 2331 S. 1230 
[11L=YeFON x- 100 CHOP 1 
[21Z-text-1 0 + ((x¥)exYENII OT 
[9}(Es|-/1y/t 


Answers 10 Selected Exercises 555 


— 9684, = .1228) 9. (.1435, 1.549, 1,849) 


EXERCISES 3.8* 


H=9 Ye=t Yy= Bi s(x) = het — do, 
sx) = Hs — 27) + 8x — 1) + Hx — 2), The four conditions thus become 


( 540 =1, 50 
Q 40) =1, 5,2 
QB) sq) 


(4) 57a) 


which are easily checked. 
3. The output of SPLINE is 


cr 
2.31130 8721100 68. 60000 
5 35480 93.95500 87. 21100 
9.42050 68 56900 93. 95500 
2.96730 9976700 68 56900 
796130 11316000 99. 76700 
41300 7878700 113. 16000 
130.89000 78. 78700 

2104400 6725600 130. 89000 
42. 98600 12989000 67. 25600 
0 00000 115, 20000 -129 89000 


EXERCISES 4.1 


1. The point is nor on the line 
3. The point is on the line. 

5. The lines intersect at the point (~6, 2) 
7. Intersection for ¢ = 
9 
1 


Parallel and distinct. 

|. The point is nor on the plane. 

13. Intersection for (fy. fa. 1) = (38, —15, 22) 

15. No intersection, line parallel to plane, 

7 = 68, orf 
= forty =3 — 2a fy = 5 — 4a. 


17. Line of intersection given by 5, 
19. Line of intersection given by s, 
21. Distinct parallel planes. 


23. Write s = 1 — 4; then r= (1 — Op + 1g =p + y —p) = Kn). Now use Proposi- 
tion 4.1. 
25. Since the figure is a parallelogram: (b — a) +(d —a)=c—aorh+d=e+a 


27, Let the plane be plty, 


= Py +10) + lot, and let the line be My = + w 


556 Appendix A 
Answiers 10 Selected Exercises $57 


If there is no intersection, then the echelon form of [v,|vs| — wlq — pol is 
(ileal —wla = Pol G2 1 0 10 CHOP 0.02 


109 (8C).8(2 293 1-3 1) 4.x 
Pai a 1 00£0 0.000 = 3.0060, 3.0020 
Fatal B09F-1 5.8861 3.0160 1. 84e0 
3.09E1 9.5161 180 2.40E-2 
Now apply Proposition 4.5. 3096-1 9.516% 2 406-2 1. 8BE0 
8 .09E + 5.8861 -1. 8460 3.0160 
~1 0060 1036-18 ~-3.00£0 3. 00£0 
EXERCISES 4.2 8. 09E-1 “8 88E1 3.0160 1.8460 
je es x o -0 “3096-1 esre1 1 2406-2 
Anca ae as ; ; 3096-1 ose 2 18860 
8.09E 1 s.88E1 1 3.0160 
ee ee eee rey rere ah 8 1 00€0 207E18 3 ‘ sapere 
oie y = aint 3. m= 0.011 = 20% los} 
0 -1)y, 
3 ae tao ra 5 se hol as 15, Po = (0.0) P= (0, =Ns pe = (U0,X=[_} oly 
sy 3 4-2 y en Le (7, 2x + 3y =8 and 2x — y =0 intersect at py = (1,2). 2v + 3) = 8 has parametric 
2 i} 2 0 z -4 3 3 =n 


+{ 7] 
so [ Bal points along the x" axis and 


Pi Po 


Be (a reeset 
Similarly 


aaoedf| tee 


Mab 
ales 


19. (a) Coplanar 


E2N0T., i Re Ly. 
ai Y= (5 al” 2 2s Si 
10 0 
27. Y’=|0 0 O;X' 29. tet 
0 0 3 


a re de[ a glt ipeswe x= [Jer an» 


a re(ol+G cle *=[a]+[ a} 


bb ol* 


y 


558 Appendix A 


Answers to Selected Exercises 559 


9 Aline I. Parabola 
13. Minimum at (5, —1) 15. Line of minima through (.5, 0) and (1, 1) 
17. Saddle point at (—5, 5) 19. Parabola 
¢ | 4a) 
21. Multiply out X7QX = (11 -X7]]——||- 
rr 4B) a}|y 
23. 2. NEWQUAD OP 


(11 BePREP P 
12] 2-(9B)+ x (0 Sxor80)+. xe 
(3) 2200.1. 2xNZIV I, 10.4 hz 


EXERCISES 4.4 


1. py — Po = Up — Po) + Py — po)- A plane in Ri. 
3. ps — Po = AP, — Po) + H ps — po) A plane in R’. 


5: Ps — Po = AP) ~ Pod + 1 Ps ~ Pods Po ~ Po = MP) — Po) + 3 Pa — Po A 3-Mat 
in RY. 

0 0-1 7. The line (1-flat) in R¥ through (4, 5,0, 6) and (2, 2, 1,6) 

Cee ale 9 The point (0-flat) (2, 3,4, 5) in RY. 


11. The point (—15, ~20, 47, 25). 
a Eee} 13. The line and the 3-flat do not intersect, 
15, The plane is contained in the 3-fla 


39. (a) Let d and be the partitioned matrices. Let C = AB. Then (Proposition 2.1) 


1ooo0o0 
et jo 2000 
ale Salt) 17. Linear with matrix ]0 0 3.0 0 
00040 
Analyze the cases (= 1 fst ssl yA ex Lj ahi and / fi / #1 separately, lo 0005 
(b) Use (a) to check that the given A-* is in fact an inverse. 19. fix xPaxg + xh 
41. The hint is sufficient. 2 fx SP NF 4x8 +7. 
43. xP NEWTOOLD XPRIME 23, (mp0) =((m.0)0)+ x10 
[1] X1 OL(PREP Pye nt, [1 )KPRIME 25. The hint is sufficient. 
% No change necessary for exercises 27, 29, 31, 33, 35. 
45. Z<P NEWAFFN BA. B 
Le ates EXERCISES 4.5 
[2] 2-4 01(( (12018984). 118A) * xE)BE 
y 1. Basis (p, — po 1,2 3. Basis {p, — pol ( = 1.2.3) 
5. Basis (p, — pol? = 12.3.4) 7. Condition 3 fails. 
9. Conditions | and 3 fail. 
EXERCISES 4.3 The echelon forms of the matrices of exercises 13 through 19 are 
1. Ellipse 3. Imaginary — by inspection 1020 10-2) 4 
5, Imaginary ellipse a De 3. Jo 13.0 is. Jo 13 5 
ginary ellips generate hyperbola Rae re eek 


$60 Appendix A 


126 Taio) uy 
oo = oeonteesinens 
Yea a) <0 1 loco 0 ioe 70 
000 0 000 0 0 
The answers for exercise 13 are 
Column space: (—2,3, —1), (—2.2. De —LD 
Null space: (—2, ~3, 1,0) 
Row space: (1,0,2,0), (0, 13.0), (0,0,0, 1) 


dim (null space 47) = 3 — dim (column space AT) = 0 


21. Intersecting in a line, the planes cannot be parallel. 
23. The planes are skew: nonintersecting and not parallel. 
25. The hint is sufficient. 


In exercises 27 through 33 the null space is the column space of the following full-rank 
matrices, which are displayed to two significant digits 


2. 6 7E1 2 0.40 0.80 
1 0€0 0.00 1.00 
3364 100 0.00 
356-18 0.60 ~0.20 
u 1.00 0.00 33. 1.00 0.00 0.00 
0.50 -0,s0 0.50 -0.50 0.50 
0.00 1,00 9.00 1.00 1.00 
0.00 1,00 0.00 1.00 0.00 
0.00 0.00 1.00 
35, F-\S) spanned by (=2, =3, 1) 37, F-¥(S) spanned by (=2, =3, 1) 
39, VozM PIVS EG By induction on the number of 
1) 210 nonzero rows of E. The result is .o if, 
12) Te (MeMe (1 TEBE TED ED E = 0. Otherwise find the index of the 
Cale eersiseeyea first nonzero entry in the first row, call 
ih senMielystaoae it J. Then the answer is J catenated 
, with the answer for £ with the first 
row discarded. 
4L Y Z-COLSPACE AE 


[11 E-ECHELON A 
[2] ZAL: (Fr, lA)PIVS E} 


v 
43. Y Z.NULLSPAGE2 A {G:E\M 
[1] E-(GGAUSS A)+ «ALBA 
(2) Mere sA 
(3) 28((+/ (EM). aM) 046 
v 


The expression +/(£+M)\. 2M counts the number of rows of E that are not negligible 
‘compared to the input data. 


Answers to Selected Exercises 561 


EXERCISES 5.1 

1. (V222) 

3. (V2, Cos! 1/ V3) 

5. (Vin = 1, Cost 1/ Vn) + (00, 2/2) as n> 00 


(Note: In exercises 7 through 11, (ATA) [1: 2] refers to the exterior angle at p, Whereas 
All; 3] and A[2; 3) refer to interior angles.) 


7. (b). (e) % (a) 
M1. (d) notice that -y2 + VIB = 32. 
13, sav ost w 15. vz-v DSPREAD W 
(1)2-NoRM vw [112 (V SPREAD W)-0~180 
, v 
17, sz-sives A 


[VJAR (HA) AALS 1 3 B]-AL2 1 2] 
[21Z-(1 HA) eo? 
v 
19 If uTX = 0, then 


x 


boat 
1 iene. ant ua 


oe ay 
21. Itutx 


0, then 


so take o = (1, 1,0), say. Then [uju}/X = 0 implies 


4 
X 4 so take w = (—1, 1,2), say. 
1 
° 3 f= 071m 

23. Yes as. vessino=[i]+ [79 UIE] 
27. Yes 29. Yes (ef. exercise 23) 
31. Yes (ef. exereise 25) 33. No (ct. exercise 28) 
35. If the triangle has sides mv and w+v as in Examples 53 and 5.4, 
then ju + off =(u + 0) + (u + v) = ul? + ell? + vee, but wee = jul |le|| cosd by 


the definition of @. 
37. The columns of A =[u|0] are linearly dependent if and only if ATA is not 
invertible, 


is invertible if and only if (w+ u)(v-v) — (uv)? £0. 


562 Appendix A 


39. df p). MQ)? = W/(P) — SQ? = ACP — Il? = (ACP = QYTACP — 9) 
=(p — QTATA(p — 9) = Ip — ai? = dp. 9) 
Similarly for angles. 
41, d(h(p), Ag) = di figip), gq 
43, Reflection in x axis of Ris 


Fe ed 
Rotation of 180° about x axis in R® is 
at He 
0-1 Offe, 
o o —1){x,| 


If we identify (x,,.,) with (x),.*,,0), the effect is the same for points in the xy plane. 
45. fis linear, 


(gp). (9) = d(p.9) 


Sle) = foe) 


W/CM | Sow Wen wil 
A S(P). SQ) = WP) = A = lap ~ agit = \oliip — gi = leld(p.g). 
47, y =q + RyX where q =p — Ryp 
49, Let g(X) = (I/a)X. Then see the answer to exercise 33, 


A¢ghp), 8(q)) = (A/addi p,q), Then 


a(Lae igh) = Lac sep. fig =Au(p.q) =dp.9) 


so g is a congruence and, for example, 


JX) = agiXy =a(b+ ali =a) 


w +ak,[) i) 


where b' = ab 
51. Yoru = Vol FUE + ~~ +08 > 0. Further vy =0 must mean that v? = 0 or 
, = 0 for all i, since there can be no cancellation among the o% 

53. (c) No 


EXERCISES 5,2 


Lv Zan A is 
oy z-0 
(2) -4s-aqrsaqsser/a 1ejayvo 
[9] 26((0-2)--30(AL1 51 J-A[2:2)) 24a 1.21) -2 
7 


Answers to Selected Exercises 563 


3. o--L) 7): egenvatues: 5. 0; positive B i ha 


vypl =... 
cee] =i ‘th eigenvalues +1, not positive 


cos —sing 


sin @ cos oJ Coin) 0 
7. Q= eigenvalues +1 
cos@ sin® 
[ey 828), Gin29 <0, 


The matrix represents reflection in the line through the origin at an angle 0 to the x axis. 
The matrix Q obtained from ROT ANG A would just be 


jee sind 
sind cos. 


9. 3Vi + 2VRI WN. V3, v2 
1B. +O-JACOB! A 


0.914697 0.393316 0.092911 Eigenvalues are 459, 4.18, ~3.64 
0.390436 0 800646 0.454451 A is not positive 


© 104353 -0 451960 0 885913 
15, +OeJAcoB! A 
© 657192 -0 260956 0.707107 _ Eigenvalues are 4,56, .439, 0, 
© 369048 0 929410 0 000000 A is positive 
0.657192 ~0 260956 0 707107 
8 


1.403620 0.788205 1 403620 
0.172793 0.618412 -0 172793 
© 000000 0 000000 0 000000 


7. 4O-JACOBI A-HILB 6 

© 062 0.615 0.749 0.011 0.00; 0 240 Eigenvalues are 1.62, 

0.491 0.211 0.441 0.180 -0 035 0 698 242, .0163, ,000616, 

0 535 0.366 ~0 321 0 604 0 241 0 231 0000126. 

0.417 0.395 0.254 -0 444 -0 625 0.133 A is positive definite. 

0 047 0.388 -0.212 -0 442 0.690 0.363 

0 541 0.371 -0.181 0 459 -0 272 0.503 

5 

1 S4E-3 1.3362 1036-2 Va7eE3 1 34e 
3.03E1 1.806 1 194e4 1.91E1 1B2E 
9.53e"1 4 08e+ 3246-1 2.6961 2.316 
3.9565 214E-3 1576-3 1.57E 3 1.63e 
a7 7.9265 -2.06€-4 2.2764 8.948 
3.0762 2966-2 170-2 4.6362 «6. AZE 
19. Let JACOBI A, and X = py + ON'. Then /(x's"= 
9.25 — 1.555(x')? — .1981( $0 py is A maximum point 


564 Appendix A 


21, Writing /(X) =¢ + BX + NTAX, one has 
ECHELON A,8-B+2 


10000060 2. 60208E-18 2.000000 8 67362E-19 
1.73472E-18 1.000000 3.000000 4.39681E-19 
0. 00000e0 2.60209€-18 2 60209E 18 1 .00000E0 


‘Thus the linear term cannot be eliminated and there are no critical points. 

23. I f(X) = 0 + B+ NX + NTAN, B considered a vector. then the critical points are the 
solutions of (D/\X) = B +2AX =0, On the other hand the coordinate change 
X = po +X’ will eliminate the linear term if and only if B + 24py = 0. 

25. The hint is sufficient, 


27. (O7Q)1 = ONK:KIQIA:K] + Q'K:LIQ[L:K] = RER, +0 =F; 
(QTO): = ONK:KIO[K:L] + ONKLIO[L:L] =0 + 0 = 0, and so on. 
29. (a) DI (ONATA)Q)K:K] = OTK:K(ATAVOIEK] 


OU:KFATAOL:K] 
(b) UTU = (AQ(:KID[K; KP YTAQL:KID[K; KY! 

= D[K; K}'Q[;KTATAQL:K DIK: KY 

= DIK, K KPDIK, KI by part (a) 
(©) 0 = (QTATAQY:L] = OTATAOL:LJ implies ATAQL:L 
inverse Q. Proposition 2.14 now shows that AQ{:L) = 0. 
(4) QOT =(QOMEN, «N] = Of N JOA, N] + OL NLIOML: NY} 
QUKIOGAI! + OL LIOLLT 


= 0, since QT has the left 


(ce) 0: KIDLK: KP 'D[K: KJQU:K |" 
OU: KJOL:K]P 
1207 — AQ|:LIO[:L]" by part (d) 
I, since QT = Qt 
31, The hint is sufficient 33. The hint is sufficient 


EXERCISES 5,3* 

1. (@) Obvious (by The eigenvalues are —1, 2. 

3. (a) Eigenvalues: ~6,16, ~6, —4.24, —4, .162, .236; saddle 
(b) Eigenvalues: 4.24, 4, 6.16, 6, ~.236, ~.162; saddle 
(©) Eigenvalues: 424, 1, -2, 236, 414; saddle 


EXERCISES 5.4 


hk (-20 3. S* = {0}, basis is empty set ((,0)90). 
5. (-2,-3,1) 7. (hh =11) 

0-3 

=a 4 
9. r 6 

0 -5 

o 1 


31 


33. 


38. 


37. 


Answers to Selected Exercises 565 


(a) Since s in S implies s 1 v for every v on S* one has S contained in (S+)+. but 
fim S = dim (S*)* 


(b)Null space (4) = (row space 4)*, so null space (4)* = row space (A), The rows of 


[ 1 = 
2-1 -1 
are independent 
v8 = (1,0), 08 

i ), (0, 5), dist = .707 


3.96, — 515, 3.97, 099), v 


(039, .515, 029, —.099), dist = 526 


548 21 eel 
v2 WE -1/v5 ra 
Wy -Wv¥6 Wy 23. ‘i 
0 V6 v3 H 

-I a | o- 

femal) ih te =all i} &=[1 oo) 

ab dist = v2 


gk £ A ton =r =t 
-' 3 ft 0 || dae eae 
t 1 3-1)" by -! -! 1 1 
t 1-1 3 =! =! i) 1 
1-1 1 ! 

-1 i) 1 1 

SEAN Gi oki ot 

1 1-1 1 

v= (1,0,4.0), vt =(0,0,0,0), dist =0 


The hint is sufficient 


reales Ts =f Ly col ab 


0 0 
=uuT 
ie ee lb o} 


El-s th 


perce: r= ["] conan freon) 


566 Appendix A 


39. The plane is parallel to the xy plane. Hence, by inspection, 


10 0) o) fio oj 
o 1 olx; y=lol+fo 1 | 
000 al foo -1 
110 y for o 
4 ¥ v1 ofx; y=]-if+]1 0 ox; 
000 2} loo -1 
4B. ¥=4 
1 
-I 
43 
1 
45, a 
lo 
27) [3 2 
-1 0 -1 
YT oltil o a alice 
1 0 1 
3-1 
; -1 
41. Y Sie 
=! =I 
1-1 1 
-1 1 
= =I 4}, 
-1 =I ' 
2 
4 
49, Y=f(X) = (7,6, =10; Y=! 3) — xs (1.7.6, -10) 
-20 
EXERCISES 5.5 
1. (b) sprtera Als!) is 
me nQe icc) 
e-! 0 0 
Q=l9 o -1 0 
i (ONO) el 


Answers to Selected Exercises 567 


and OTA is 
1 a) 
oO Oe 
oO -. 0 
i) « 
1 1Q,4is 


and SPRFLCTR Adl:2) is 
W/y2 -1/V2 0 


Vy2 -1/V2 0 
0 0 1 


Set A= 1 110A, and Q, = SPRFLCTR Ay, Then 


oof Et] 


1-1 24 sD) 
»aeolt DCL 
‘vi lo 0 
ee Wve -1/V3 i Vy2 
5. A=] 0 2/V6 1/V3 ‘0 ave /Vvo 
v2 -VVe W/V 2/3 
7. pz-pRucTR ALU VZ-RFLCTR A 
[1]2-U+ x2U-ORTHO A [1126=(10 11pA)-2xPAICTR A 
: v 


I, vz-oRDIST A 
[12 ((AL:1]-(PRICTR SBSP A)+ xAL;1])+.*2) 492 
v 


or 


$Z-ORDIST A 
1112-((AL;1]COMPS SBSP A)[;2]+.+2)+-2 
v 
13. SZeREFL ASP 
(1)2--(1D 11pA)-2xP-PRJCTR SBSP A 
(2126 (20a. 11-P+ AL TDIZ 
v 
14. Check answers by comparing A+_xU/ to zero. The dimension can be checked from 
the answers to exercises 40 through 47 of Exercises 5.4 
16. No solution. There is a plane of least-squares solutions. pj = (—5.580, —8.060, 
1,1), py = (—3.580, —5.060, 0, 1), py = (424, ~.060, 0. 0) 


18. (~.08641, 88981, 22340, 81690) 


568 Appendix A 


EXERCISES 5.6" 


1 @ B=(0,0,44" =4[5 (i]s he principal I-at isthe x axis fa > Vand the 


yanis fac. 
(b) This is part (a) for a Since P=0, pp =p is piow where 
u = (—sind, cos) and S (p,.u)* = 4, independent of #. 


3. The stable axes are parallel to the longest and shortest sides; the unstable axis is 
parallel to the side of intermediate length. 


5. The stable axes are parallel to (,927, .376, 0) and (0, 0, 1), The unstable axis is parallel 
to (376, .927, 0). 


6. (a) OTO =I, hence A =(QT)T = (QF) 
(b) Obvious, (c) 1 = CU, HILEH) > NUL. 


9. The hint is sufficient. 


8. ‘The hint is sufficient 


EXERCISES 6.1 


1. Maximize = = 2x + 3y 3. Minimize z= —2x — 3y 
subject 10 —Sx + 6y < —7 subject to Sx ~ 6y >7 
Tx + 89 <9 —7x = 8 > -9 
xy 20 wy 20 
5. Maximize z = 2x 4 3y 7. Maximize z= 2x +My" — 2) 
subject to Sx — 6 <7 subject to 
Sx + 6y < -7 
Ix + 8 <9 
X20 
9, Maximize z= 2x + 3y 
subject 10 Six! — 2) ~ 6(y" 2) <7 
=S(x! = 2) + (y= 2) < -7 
Ux" — 2) + 8" — 2) 59 
=x! = 2) = By’ = 2) < -9 
x yz>0 
I, Maximize z= -(d* + d>) 
subject to 2x? — x) + yt —y) — (dt —d <4 
=Ant =) — 3 =) + 
edd 20 
13. Maximize z= —(dj +d; +3 +d3) 
subject to 2x? — x7) + yt =p) — (dj dy <4 
= Ux — x) =H yt =) +d} ~ dy < -4 
Ax — a) + Hy? yr) — (dg — ag <5 
=Uxt = x) = xt =) + (dg — dg) < -5 
tw dh, dz, dg, dz >0 
15. Maximize 2 = 2x’ + 3y* — 3y- 
subject to x’ <9 (x’ =x 4 10) 
yt y- 20 


7. 


19. 


2 


23. 


2s. 


27. 


29. 


31 


33. 


38. 


Answers to Selected Exercises 569 


Maximize z= d* + d- 
subject to 4x’ + Sy’ +d? —d- =23 
xiyidd->0 (x 
Maximize z= 2x* — 3x~ 
subject to Yx* —x-) +37 — yp) <4 
Maximize = = 2x + Sy 
subject to 10x + 70y < 490 
dw 43y <32 
20x + 10y < 240 
xy>0 
Minimize w = 4x,, + 6x,y + 8Xyy + 5g) + Toy + 12¥oy 
subject 10 x4) + Xp + Xj, < 10 
Xoy + Xzg + Xan $5 


x+2, 


x +%a1 22 
Xia tim 25 
Xt ty 23 
xy, 20, all i,j 
10 t 
a=]! TT ea |2l. 12 unknowns, & 6 
= [taf 8 = |S]. 12 unknowns, & constraints 
14 5) 
i) i 1 
12 2 
A=|! 4) ga}5], 21 unknowns, 18 constraint 
=]] $f. 14}, 21 unknowns, 18 constraints 
! 6 7 
nD, 8 
00 0 0 
igo 3 
96 4 1 
A= 11 a6) B= if 19 variables, 18 constraints 
Heres wk 2 
Ue We | 1 


(a), (b) Minimize w = I]! — x — 2yj], (or w = jf — x ~ 2y),) 
subject to x >0,y >0 


The minimum value of w is 0 if and only if the answer to (a) is "Yes." 
Part (b) is the same problem as part (a). 


Minimize w = 2x + 3y — 6), (or w = [2x + 3) — 6|.) 
subject to —1 <x < -2.2<y<3 
| 


(-b ? If 


subject to 3<x<4,3<y<4,2<2<3 


Minimize w = 


1 


370 


Appendix A 


EXERCISES 6.2 


Vertices: (—2, 1), (1, 2), (0, — 1); max at (1,2), min at (0, -1) 
Vertices: (—2, 2), (1, 3), (3, 1); max at (1, 3), no min 
Vertices: (—1, 1), (2,2), (1. —); max from (—1, 1) to (1, — 1), min at (2,2) 
v=(-12 +1, -—),0<6<3; w =(0,0,1) 
oll 
oe 
0 1 


tor[_5 SHAE lla! nyo 


3, 4), X* = (0,0, 3), ¥ 1, 1,0), ¥" = (0,0) 
N= (4.4), X7 = (0,0,0), ¥ =(—44.0) + 1, -F.D, ESS, Y= (0.0) 
(0,1) + 1, YS OF (0, 3,2), ¥ = (2,00), ¥" = (0,0) 


(a) Minimize w=d* +d> 
subject to 2x +d*—d->-1 
—dx-d*+d>>1 
xdt,d->0 
(b) ‘The solution occurs all along the ray [(1) = 


0,1) +41, 1), £20. 


(0.1) + CL, WY, > 0; X! = (2,2,0); Y= (0,0), ¥ = (0,0, 1); that is, 
0, d> = 1, and hence w = 1 
EXERCISES 63 
i 4,1), N= (0,0, 1), ¥ = (6.4.0), ¥" = (0,0) 
3 8,0), X" = (0,4, 10), ¥ = (2,0,0), ¥ = (0, 10) 
5, N= (5,3), X/ = (03.0), ¥=(1,0,4), ¥° = (0,0) 
7. X = (6,5), X" =(3,0,0), ¥ =(0, 10,1), ¥’ = (0,0) 
9 X= (4,2), X” = (0,6.8,0), ¥ = (10.0.0, 10), ¥" = (0,0, 110) 
Me X= (7,7.4), X! = 0,0,8,0, 1), ¥ = (6.4, 10,00), ¥” 
13. X= (1, 10,10), X’ = (8,0,0,9,0), ¥ =(0,9,8,0,8), ¥*=0 
15. X =(4, 10,0), X” = (0,8,9, 4,0), ¥ = (9,0,0,0,8), ¥* = (0,0, 90) 
17. 2,2), 5,4) 4,1) 
19. (0,4), (1,2), B,D, (7.0) 
21. (2,2), Set is a single point (maximum is unique). 
23. The hint is sufficient 
25. (a) The maximum is attained at a unique vertex in R®. The intersection of the line 


and the triangle is the point (2, 1) 


(b)_ The max is attained along an edge in R® but it projects to the single point (2, 3) 
in R* 


(©) The max is attained along a face in R® but it projects to the segment from 
(2, 1.9) to (4, 1) in R® 


Answers to Selected Exercises 571 


EXERCISES 6.4* 


Answers to exercises | through 6 can be found in reference | or 2. 
7. The hint is suffice 
9. The hint is sufficient. 

11. (a) The payoif matrix is skew-symmetric, 

(b) p=q=(hbd) 

13. p =(4.0.3), q = (0.4.3), payoft = 0 
16. p =(0, 1) is the pure dove strategy. q = (1,0) is pure hawk. E(p, p) = (a/2) —¢, 

E(q, p) =a > E(p,p), the first condition of Definition 6.8 is violated, 

18, The hint is sufficient, 


20, Addinga = —a,4 = {0 peana E(p.q) = b4[2) <b, Sop = (0, 1) is evolutionar- 
ily stable, 


EXERCISES 7.1 


For exercises 1, 3, 5, and 7, the determinant is 6 
For exercises 9, 11, 13, and 15 the determinant is —6. 
For exercises 17, 19, 21, and 23 the determinant is 1 
For exercises 25 and 27 the determinant is —6. 


29. -55 aL 4 33. 8 
38. 2 37. (-1)"" 39, 1.414 
4 4 43. 1414 45. 19.60 


47. The matrix is not invertible, since its columns are dependent, Hence det (A) = 0. 
49. The hint is sufficient. 
SI. (b) det (ATA) = (det A), Since ATA is symmetric, apply part (a), 


EXERCISES 7.2 
L ASI 3 A=Iti S. A =I, 1, 2; diagonatizable 
1 A= Si 9, X= 1, 1, 2, 3; not diagonalizable 


1. If dv =o, then (A + alu = Av + av = (A, + au 
13. If Ao = Ayu, then A%v = A(Av) = AyAv = Aju. 
1S, det (AT — AL) = det (A — A1)?) = det (A — AJ), since det (B) = det (B7), 
17. The hint is sufficient. 
19. (a) Pv Ivo 
(b) Pu = Pot =0=0-0 
(©) A, = Pu = Pe +04) so Av! + Av! =o, and this equation, which 
implies (A — Iu! = Avs, so v'-(A — Nu! =v" Aut or (A — I)|jo! D2\l# = 0, so 
A=1=0 if of £0. If =0, vt -(\ — Net 08 -Av oF Aju? = 0. 


S72 Appendix A 


() (uyrirounuiri = [Be] uurw| Mi = Shin 


=[EelEsh]-0 3 
(e) det(A —\/) = det(Q-4Q — AJ), so apply part (d). 
2. (a) 2425 = (ay + Ayilag + Bal) = (aeg — Bulb) + (Bs + Broad! 


ey “Aid ee Bh) (te Bala =iBa + Ba) 


Pym By + By —B Bs + a9 


Parts (b), (©), (4), (@), (1) are similar. 
23. 6), —2 — 5,2 +37) 


- l+/ 

2 2 

Beat ane toeeat 0 
Tenewaiear Sieh 


27. Simply perform the multiplications. 


2. (a) a =2achore ot Pis[) 3] 


(b) a = 3,4 choice of P is (F A 


1 ve 0190 
(©) «=3,achoice of Pis]1 0 0], Then Pt=]—1 1 1 
0 Ot oo1 
vi -t 2-1 9 
(8) «=3,achoice of Pis]1 2-2]. Then P'=|-1 0 
2 =1 ost 


EXERCISES 7,3* 


Pie e+(—Dnd =o) 


= Tks, V3 +(-DNI = V3) 


[1 —(—1)"}, eigenvalues are 2, 
7. The hint is sufficient, 
8 Sy yrs s 
Eee =42 al ‘ 
‘The matrix is scale sa asi te yt of Propoiion 7.19 
\ 
0 i ; =x i i i 
oo} [ooo 


The matrix is stochastic but does not satisfy the hypothesis of Proposition 7.19 
(although the conclusion of Proposition 7.19 is true), 


13, 
1s. 


Answers to Selected Exercises 573 


ees eal Fee al The matrix is not stochast 
(a) Adding the columns first (see hint) gives w =v, > 
(>) (AB)l:#] = A(BI:i)), so apply part (a). 

(©) (rar 0B) 


(=4A)+ xB 
1+ xB Since +/A iS (14 pA) 01 
“8 


+ so apply exercise 16 


EXERCISES 7.4* 


9, 


(=D = DIAS A, 


Rotation present, reflection not present 
Rotation not present, reflection present 
Rotation present, reflection not present 
Rotation present, reflection not present 
Notice that p(\) = (—1)"A — A, = Ag). (A —A,), Which has constant term 
Aiden: Ay 
¥ Z-AXNANG A: 0 
[1] QHSHLDR 1 BACKSUB(ECHELON A~10 3). 0 


{2} 20151}, «92180x-20((HQ)+ AY. xO) 12:2) 
. 


EXERCISES 7.5* 


(a) Dy; center 1, radius $ 
Da: center 0, radius 16 
Dy: center 2, radius 1 
No, the dises overlap. 
(b) Dy: center |, radius 1 
Dy: center —3, radius 2 
Dy; center 5, radius 2 
Yes, the three discs do not meet. Notice that D, for A? has center —3, 
radius 0; that is, —3 is an eigenvalue. 
(0) Dy: center 1, radius § 
Dz: center 2, radius 
Da: center 4, radius j. 
No, since D, and D, meet, The answer is yes for 47, however. 
(a) Center 0, radius 1 
(b) Center 0, radius 2 — but notice that roots of g(x) are also roots of f(x) of _ 
part (a). 
(©) Center 0, radius 1.1 


574 Appendix A 


a 


5. (b) For P= : 


¢ 


|. o-[ Jom 


where the entries of B will be very much smaller than n? for n sufficiently large. 


EXERCISES 7.6% 


= avr avr 
ge are | 
f 1 
3 oO = i) 
ol 
5, y(t) = Mel + 6%) = cosh (1) 7. y(n) = e(eost — sind) 


EXERCISES 7.7 


2,34 3 A=0,123 5.4 
2 N=t—-h-h-l 9 N=12324 Wd 


h-l 
$2,324 


13, xt 8 1x8 pe $6 = (x = WN + I = 3x +2) 
xO — 15x + 85x4 — 225x2 + 274x — 120 
(x = Ie = 2x = 3x = Ayla = 5) 
17, xt = Bx¥ + 4202 — 80x + 125 = (x — 6x + 25x? — 2x + 5) 
19, x — ax — x4 + 30x3 — 66x? + 64x — 24 = 
(x = De = 24x + 3? = +) 
Note: Although the double eigenvalue 2 is computed somewhat inaccurately, the 
polynomial (x ~ 2) =x? ~ 4x + 4 is computed quite accurately by CXCOEF. 
21, The hint is sufficient. 
23. ¥ Z0XEIG A :P 
(1) Anat nh pBP HD 0,01 SIAES 1 
(2) BC (ALR:2 442) KAT 21) 2 
(31 2-(A[2;21.2)-2 
9 


25. (-21A)DBLSHFT A or just A, since -21A isa zero matrix. It follows that A™ = Jand 
50 Q is the result of HHLOR working on a matrix of the form al. In this case HHLOA returns 
an identity matrix, 


27. The hint is sufficient. 


29. The hint is sufficient. 


APPENDIX B 


A Short List 
of APL Functions 


Primitive Sealur Functions 


Symbol Monadic function Dyadic function 
rn Identity Addition 
i (The graph is = x) 
Negation Subtraction 
Sets 
1 if x>0 Multiplication - 
x xe=[ 0 if «=0 
| tif eco 


Reciprocal 


Exponential function 


Fs Ceiling Maximum 
cies) ° ab is max (4,6) | 
i Floor (greatest integer function) | Minimum 
Lows — 7 “1 
; Absolute value Residue (remainder) 
[-ais3 31482 - 
' Factonal Binomial coefficient 
tris Pec + 0 | xtyis related 10 B(x.) 
eke el 
= Times | Trigonometric and hyperbolic functions, 
02 is See below 
Roll Deal 
? 25 is a random digit between '5752 picks S random digits from > 
Land S 1 10 52 without replacement 


516 Appendix B 


‘Trigonometric and Hyperbolic Functions 


Symbol Function Symbol Function 

oor 

tox sin “tox 
20x cos x 20x 
Sox tan “30x 
4ox vie 4ox 
Sox sinh « “Sox 
ox cosh x box 
Tox tanh “Tox 


Logical functions — all dyadic except ~ 


A and - equal 
v or + not equal 
< Tess AA) rand 
> greater ~~) nor 
s not greater ~ not 
2 not less L 
Miscellaneous Functions 
Symbol Name Monadic Dyadic 
BQ) domino BA isa left inverse for ABBA is the least-squares solution of 
AX =B 
‘ index, ‘is the vector of inte WA gives the index of the first occur- 
index gets Lon rence of each component of 4 in 
of the vector 
. shape A is a vector giving the Vp reshapes A 10 an array of 
reshape shape of 4 shape ¥ 
(0,\) transpose mA is AT 418 is the main diagonal of 
1 take ca VtA extracts components from A in 
the manner specified by 
4 drop = VLA deletes components from 4 in the 
manner specified by V 
“ member - 


AB flags the components of A that 
‘are components of B 


Short List of APL Functions 577 


‘Miscellaneous Functions (Continued) 


Symbol 


Name 


Mowadic 


Dyaste 


2(o!) 


(o-| 


compress 


upgrade 


goto 


ravel 
eatenate 


reverse 


rotate 


reverse 
rotate 


We PH defines H/ to be 
the vector of indices 
‘such that /{H] has 
‘components arranged 
in descending order 


Weg¥ defines WV 10 be 
the veetor of indices 
such that FHV} has 
‘components arranged 
in ascending onder 


=3 means GOTO line 3 


A strings 4 out into a 
vector 


© A reverses the order of 
the components or col- 
tumns of A 


© A reverses the onder of 
the components or 
rows of 4 


L/A deletes components from A cor 
responding to the zeros in the logi- 
‘eal vector L. Deletes columns from 
matrices 


Same as / but deletes rows from mate 
ices 


1 A expands A by putting zeros in 
the places corresponding 10 the 
zeros in the logical vector L, Adds 
column 10 matrices 


Same as \ but addy rows to matrices 


A.B sticks arrays A and # together 


V0 shifts row entries in manner 
‘specified by V 


Ved shifts column entries in manner 
specified by 7 


518 Appendix B 


Operators 
Symbol Name Result 
reduction Pile by Oy-- 118% Oya» where fis any primiuve 
dyadic function, «/ 8 Se, and x/ w He, Reduces the 
rows of matrices 
Hed reduction Same as / but reduces columns of matrices 


inner product 


‘ jot 


Takes (wo primitive dyadic functions as argument 
(ey Bass DFR Mae iS (Oem) ME ggw) fF 
+e 1s mutnx multiplication 


Turns inner-product operator into outer-product operator. 


(04.02) = fry my 
1 fs 0 fy fy 
fe, Ua aly 
Peg Cas 


APPENDIX C 


Some Miscellaneous APL 


The mechanics of writing, editing, and debugging APL functions vary with the computer 
system used, For IBM systems any commercially published APL manual will do, Hor 
other systems the computer-center personnel should be consulted, A few facts may hold 
generally, however 


SUSPENDED FUNCTIONS 


When an error occurs in the execution of # user-defined function, the funeti 
pended. Suspended functions often cause confusion for beginners because the workspace ix 
in a subtly altered state, 

When a function, for example GAUSS of Section 3.5, suspends an error message 
accompanied by the line on which the error occurs prints out. For example, 


nis sts 


SYNTAX ERROR 
GAUSS [4}° AcA.IS, 117 


(The problem here is that a nonexistent function, 1s, has been called.) At this point all the 
Jocal variables in GAUSS that have already been assigned values (4, 8. P, L, 7) are alive 
in the workspace. They “mask” any other variables with these names that may have been 
defined before the function was called. Thus, for example, a matrix P may be mysteriously 
changed to the scalar 0) (see line [2] of GAUSS). When the suspension is removed, however, 
the original P will be available again. The confusion is compounded by the fact that the 
suspended function may be executed again and again, resulting in additional suspensions. 
Ifa function has been suspended more than once, however, it cannot be edited or dis- 
played! At this point there is a real danger that you may hurt yourself while destroying the 
keyboard. 
When an error occurs and a function suspends, 


1. De not call the function again. % 
2. Display the local variables (type their names) if you think that knowing their valu 
may help to find the trouble 


S79 


580 Appendix C 


3. Remove the suspension. This is usually done by typing 0 or just — (called the niladic 
branch). In some systems special action may be necessary When a recursive function sus- 
pends, 


To get a list of suspended functions type ps7. (S/ stands for state indicator —an 
S/ DAMAGE message usually means that You must remove some function suspensions 
before the machine will let you proceed.) 


TRACE VECTOR, STOP VECTOR 


The trace vector and stop vector are debugging aids. The trace vector for the function 
GAUSS is denoted TAGAUSS and the stop vector for GAUSS is denoted SaGAUSS. These 
so-called vectors are lists of line numbers, Typing 


TAGAUSS-2 3 8 
sels trace flags on lines 2, 3, and 8 of GAUSS. Whenever these lines of GAUSS are 


executed, a message to that effect is printed and the last quantity computed on the line is 
printed, For example, 


Gauss(2] 0 


To turn off tracing, use the expression TAGAUSS- 10. The'stop vector is similar to the 
trace vector, IC is used to suspend a function at a specific line in order to inspect the values 
of the local variables. The expression $3GAUSS~-s Will cause GAUSS to suspend just before 
line [S}, After the local variables have been inspected, the function may be restarted on 
line [5] by typing 8. 

Some systems do not have trace and stop vectors 


SOME SYSTEM COMMANDS 


)GONTINUE 
Signs off APL and saves contents of current workspace for automatic reloa 
signon 


at next 


)COPY 240 ROWREDUGTION GAUSS PROBLEM: 
Goes to Library 240, workspace RowREDUCT /ON, finds the function GAUSS and the 
matrix PROBLEM, and adds them to the active workspace. 
)OIGITS 3 
Causes all numbers to be displayed to three significant digits. Does not affect the 
number of digits used for computation or storage. An alternate command is DPP-3. 
)ERASE GAUSS PROBLEM; 
Erayes the function Gauss and the variable PROBLEM from the active workspace. 
eis 
Lists your saved workspaces. )£/8 6 lists the workspaces in public library 6. 
)LOAD SAM 


Loads your saved workspace SaM into the active workspace. Destroys current con- 
lenis. )L04D 6 SAM gets WS saw from public library 6. 


Some Miscellaneous APL $81 


)OFF 
Signs off APL. Current contents of active workspace discarded. 


)SAVE SAM 
‘Saves current workspace (which may be called SAM, CLEAR, Or CONTINUE) under the 
name SAM. 


ds7 
List of pendent and suspended functions. Suspensions denoted by +. (Pendent fune- 
tions are waiting for results from another function.) 


Index 


active constraints, 403 
adjoint formula, 467 
affine 
approximation of a function, 113 
coordinate systems, 221 
function, 98, 314-315, S01 
geometry. 200 
quadratic function, 102-103 
ANG, APL function, 315 
angle between vectors, 284 
area, signed, 449-451 
argument, of an APL function, 122 
artifictal variable, 422 
associative law, 63-64 
AT, APL, function, 126 
autonomous system, of differential 
equations, 11-312 
AVE, APL function, 127 


BACKSUB, APL function, 357 
back-substitution, 383-357 
basic column, 418 

basic feasible solution, 418 
basic variable, 418 

basis, of a subspace, 270 
bourgeois strategy, 444 


calculations, not exact, 73 
cancellation, for matrix equations. 65, 82 
Cartesian equations. of lines and planes, 
216-218 
caste, in an ant colony, 432 
catenation 
for matrices, 39-40 
for vectors, 23-24 
center of gravity 
of a set of points, 362 
of a triangle (exercise 24), 220 
centroid, 362 
CENTSECT, APL function, 186 
chain rule for derivatives. 109 
checking machine computation, 148 
CHOP. APL. function, 126 
column space, 273-275 
column vectors, 52 
commutative law, 58 
companion matrix, 50S 
‘comparison tolerance, 74 


components, of a vector, 16-17 
composite function, 101 
compression 

for matrices, 144 

for vectors, 143 
congruence, 289 

in space, 497 
congruent figure 
constraint Mat, 403. 
constraints, in linear programming, 402 

403 

convex set, 404 
coordinate change 

linear, 292 

orthonormal, 293 

preserving dot products, 292 
coordinate change formula 

affine function, 228-229, 262 

for a A-flat, 261 

in the plane. 

‘quudratic function, 238, 263 

rotation of axes, 226-227 

3 
n of origin, 226 
2by-2 symmetric ma 
relation, 287-288 
Cramer's rule, 466-467 
tical point 
of a differential equation, 118 
of'a function, 110 
ra quadratic function, 307 


294-295 


DBLSHET, APL function, 332 
DEGFIT. APL. function (exercise 15), 130 
degree, of term, 99, 103 
derivative, 106-107 
chain rule, 109 
DET. APL function, 458 
determinant, 454 
as a signed area, 449-451 
difference equations, 182-183 
differential equations, 96-97, 117-119 
critical points, 118 
Iinear autonomous syste 
dimension, of a flat, 251 
distance, 281 
domain, of function, 95, 121 


s S1LeSI2 = 


583 


584 Index 


domino 
dyadic, 76 
monadi 
dot product, 280 
generalized (exercise 32), 317 
drop 
for matrices, 144 
for vectors, 143 
dual cone, 400 
dual fina programming problems, 378 
410-41 
dual problem. 378 
dyadic funetion, 5 


ECHCHK, APL function, 180 
ECHELON, APL function, 161, 175 
eigenspace, 473 
eigenvalue, 308. 473 
dominant, $24 
final definition, 478 
multiplicity of, 475 
eigenvalues 
conjugate pairs, $21 
ill conditioned, 539 
eigenvector, 308, 473 
elementary matrix, 138 
ellipse, 240-241 
degenerate, 244-245 
emply vector, 26 


reduction of (exercise 50), 29 
equality 

convention, 46 

fuzzy, 170 


of vectors, 60 
Eratosthenes, sieve of (exercise 46), 50 
evolutionarily stable strategies, 441-445 
expected payolf, in a matrix game, 436 


factorization, of a matrix, 343 
Farkas’ lemma, 401 
feasible point, of a matrix inequality, 402 
Fibionacet numbers, 485 
fitness, Darwinian, 431 
flat 
constraint, 403 
defined by a set of points, 253 
k-dimensional, 251 
as solutions of a system of equations, 
259, 269 
function 
affine approximation of, 113 
affine linear, 98 
affine quadratic, 102-103 
‘composition, 101 
critical point, 110 


function (cont'd) 
definition, 121 
derivative. 106-107 
domain of, 95, 121 
dyadic, 5 
gradient, 105 
image of, 95, 122 
Jacobian, 107 
linear, 98 
linearization of, 113 
logical, 10-13 
monadic, 5 
nonhomogeneous linear, 98 
range of, 121 
recursive, 171 
relational, 10-13 
second derivative, 110 
Taylor expansion of, 114 
vector valued, 95 

fuzz, 74 

fuzzily, equal, 170 


GAUSS, APL function, 161, 177 

sian reduction, 140 

generalized inner product, 60 
generating set, for a subspace, 270 
Gerschgorin disc, 502 

Gerschgorin’s theorem, 503 

goal programming, 390-391 

GOTO, APL branching instruction, 166 
gradient, 106 

Gram-Sehmidt process, 331 


halt-space, 397 
hawk and dove, 442 
header, of an APL functis 
Hessian matrix, LL 
HHLDR, APL function, 347 
high minus, 2 

not a function, 6 

reason for, 19 
Hilbert matrix, 45, 180 
Householder 

algorithm, 345-347 

transformation, 345 
HSHLDR, APL function, 347 
hyperbola, 240-241 
hyperplane. 251 


1D, APL function, 122 

idemtity matrix, 47, 66, 122 

image 
of a flat under an affine function, 259 
of a function, 95, 122 

index generator, 24-26 


indexing 
for matrices, 31-33, 35-36 
for vectors, 26-27 
index-of, primitive APL funetion, 176 
inner product, generalized, 60-61 
intersection 
of flats, 254-256 
of lines, 206-208 
of planes, 213-215 
invariant subspace, $19 
inverse 
computation of, 148-150 
of a matrix, 66 
of a matrix in APL, 73 
invertible matrix, 66, 160 
ota, dyadic, 176 
irredundant system of linear equations, 
163 
isometry, 289, 487 


JACFIND, APL function, 305 
Jacobi algorithm, 302 

accuracy of, $04 
Jacobian matrix, 107 
JACOBI, APL function, 304-308 
Jordan block, 480-481 


Kronecker product, 44 


labels, in APL functions, 169-170 
Jamination, primitive APL function, 196 
LDR, APL function, 144 
leading ones, 139 
least squares 
calculation with vectors, 18 
exponential function (exercise 14), 95 
polynomial, 89 
polynomial, APL function (exercise 


15), 130 
power function (exercise 15), 95 
solutions, 86, 111-112, 352-353 


straight line, 14-16, 86-88 
{eft argument, of an APL function, 125 
left inverse, 66, 159 
length, of a vector. 281 
evel curves, 240 
linear combination, 22-23 
of linear equations, 163-164 
linear equations, redundant, 163 
Jinear function, 98, 100 
linearization 
ofa differential equation, 117-118 
of a function, 113 
linearly dependent vectors, and coplanar 
points, 224 


Index 585 


linearly independent vectors, 72, 157 
logical vectors, 143 


machine computation check, 148 

main diagonal, of a matrix, 58 

Markov chain matrix, 183 

masks, 143 

matrix 
catenation, 39-40 
without columns, 155 
column space, 273-275 
‘companion, 505 
computation of inverse, 148-150 
diagonally dominant (exercise 4), $10 
dual cone of, 400 
echelon form of, 138-139 
‘eigenvalues (symmetric ease), 308 
eigenvectors (symmetric case), 308 
elementary, 138 
game, 435 
Hilbert, 180 
idemtity, 66, 
inverse in APL, 
inverses, 66, 160 
invertibility, 68, 69, 72 
Ieft inverse, 159 
main diagonal, 58 
of a Markov chain, 183 
multiplication by scalars, 38 
multiplier, 138 
null space, 273-275 
orthogonal, 289 
parallel processing, 36-37 
partitioned, 139 


2B 


positive, 310 

positive definite, 310 

power function, 171-172 

projection, 336 

pseudo-inverse, 73 

rank of, 161 

of rank one (exercise 21), 165 

reduction, 40-41 

reflection, 336 

without rows or columns (exercise 8), 
2 

row space, 273-275 

singular value decomposition (exercise 
29), 316-317 

singular values, 312 

singular vectors, 315 

skew-symmetric (exercise 10), 446 

sparse, 193 

square, 57 


586 Index 


matrix (cont‘d) 

stochastic, 183-184, 489-492, 506-508 

switch, 138 

symmetric, 60 

(risliagonal, 194 

upper echelon, 342-343 

of zeros, 70 
matrix-matrix product, 55 

in APL, 56 
matrix-vector product, 52 

in APL, 54 
MAX, APL function, 421 
maximization problem, 364 
maximum problem, in linear 

programming, 377 
max-min test, 109 
minimax theorem, 438 
minimization problem, 362 
minimum problem, in linear 
programming, 378 

minors, expansion by, 464-466 
moment of inertia, of a set of points, 362 
monadi¢ function, $ 
multiple regression, 90-92 
multiplicity, of an eigenvalue, 475 
tnultiplier matrix, 138 
multivariate calculus sections, 51 


name, of an APL function, 122 
hegative numbers, representation of, 2 
negative part of a variable, 385-386 
negligible compared to A. 170 
NEWTON, APL function, 189 
Newton’s method, 114-117, 189-190 
nonlinear equations, 96 
honsingular matrix, 66 
norm 

Ny, 386 

fy, 387 

1. 387 

OF & vector, 281 
normal equations, 15 

derivation of (exercise 60), 29 
null space, of a matrix, 273-275 


objective function 
of a maximum problem. 377 
of a minimum problem, 378 
nonlinear, 384-386 
‘optimal strategy, 438-439 
order of evaluation, § 
ORTHO, APL function, 349 
‘orthogonal complement, of a set of 
vectors, 323 


orthogonal matrix. 289 
orthogonal vectors. 293 
‘orthonormal coordinate system, 293 
outer product, 44-46 


parabola, 245-246 
allel 
Phas, 269 
lines, 208-211 
planes, 215-216 
parallel component, with respect to a 
subspace, 325 
parametric representation 
of a line, 204 
of a plane, 212-213 
partial pivoting, 173 
partitioned index vector, 302 
partitioned matrix, 139 
payoll matrix, 435 
PERP. APL function, 351 
perpendicular component, with respect 10 
a subspace, 325 
perpendicular vectors, 284 
PIVOT, APL function, 144 
pivot, columns, 139 
pivoting, 135 
partial, 175 
pivot matrix, 134 
PIVOT2. APL function (exercise 66). 183 
POLYAT. APL function, 476 
polynomial, characteristic, 473 
polynomials 
evaluation of, 21-22, 90, 126-127 
roots of, 505-506, 535-536 
positive definite matnx, 310 
positive matrix, 310 
positive part of a variable, 385-386 
positivity condition, in linear 
programming, 377, 378 
power method, for eigenvalues, 524 
PRANES, APL function, 369 
primal problem, 377 
Principal axes, 368 
Principal-component analysis, 372-373 
principal A-flat, 368 
probability vector, 436 
Projection, onto a flat, 337 
Projection matrix, 336 
projective representation 
of affine functions, 229, 236-237 
in higher dimensions (exercises 27-36), 
267 
of quadratic functions, 249-250 
pseudo-inverse, 73, 85 
pure strategy, 436 


Pythagorean theorem, 286 


OR 
algorithm, 526 
APL function, 528 
factorization, of a matrix, 343 
quadratic form, 103 
quadratic forms, simultaneous 
diagonalization (exercise 34), 317 
quadratic function, 102-103 
eliminating linear term, 239 
eliminating xy-term, 238-239 


random numbers, 313 
range, of a function, 121 
rank 
equal to one (exercise 21), 165 
of a matrix, 161 
Rayleigh quotient, 368 
Rayleigh’s principle, 368 
recursive function, 171 
reduction 
for matrices, 40-41 
for veetors, 20-21 
redundant constraint, 417 
redundant system of linear equations, 163 
reflection 
elementary, 345 
ina flat, 337 
Houscholder. 345 
in a line, 230-231 
matrix, 336 
special, 345 
in. subspace, 336 
residuals, 75 
result, of an APL function, 122 
Fight argument, of an APL function, 125 
Tight inverse, 66 
rock-scissors-paper, game, 434-435 
roots 
of polynomials, 505-506, 535-536 
by sectioning, 185-188 
rotation, of axes, 226-227 
row echelon form, 138-139 
row reduction, 140 
row space, of a matrix, 273-275 


saddle point, 318 
for a quadratic function, 307 
scalar multiplication, geometric, 201 
202 

scan, APL operator, 529-530 
Schwartz inequality, 283 

POW. APL function, 532 

SD, APL function, 127 


Index $87 


second derivative, 110 
test, HL 
shifting, in QR algorithm, 527-528 
sieve of Eratosthenes (exercise 46), 50, 
similarity, 289 
simple harmonic motion, $15-S17 
simplex algorithm, statement of, 420 
simplex method, 419-420 
Simpson's rule, 131 
Singular values, of a matrix, 312, S01 
singular vectors, of a matrix, 31 
skew-symmetric, matrix (exercise 10), 446 
stack variables, 412 
sparse matrix, 193 
special reflection, 345 
SPLINE, APL function, 197 
SPRFLCTR, APL function, 345 
square roots, on a calculator, 117 
Mandard deviation, APL function, 127 
standard scores, 41-44 
standard scores, APL function, 127 
stochastic matrices, 183-184, 489-492, 
06-508 
strategies 
bourgeois, 444 
evolutionarily stable, 443 
in game theory, 436 
hawk and dove, 442 
optimal, 438-439 
pure, 436 
stretch, affine function, 314 
STRT, APL function, 531 
subspace, 257 
associated to a flat, 268 
generated by a set of vectors, 270 
invariant, 519 
sum of products, 54 
sum of squares, 60 
regression, 328 
residual, 328 
total, 327 
sums 
double, 45-46 
single, 24-26 
surplus variables, 412 
SWITCH, APL function, 144 
switch matrix, 138 
symmetric matrix, 60 


tableau, 416 
take 
for matrices, 144 
for vectors, 143 
Taylor expansion, of a function. 114 
tennis racket, 370-372 


588 Inilex 


TOTHE, APL. function, 171-172 
trace vector, 305, 
transition probability, 183 
translation 

affine function, 209 

of origin, 226 
transpose 

‘of a matrix, 58 

‘of @ product, 59 
triangle inequality, 283 
tridiagonal matrix, 194 
TRIDI, APL function, 195 


unit vector, 293 
upper echelon matrix, 342-343 


variable, 4 
local? 124 
vector 


Without components, 26 


vector (cont'd) 
equality, 60-61 
geometric addition, 201-202 
Tength, 281 
linearly independent set, 157 
logical, 143 
norm, 281 
parallel processing, 17 
probability, 436 
reduction, 20-21 

vertex, 403 
correspondence with tableau, 416- 

417 

finding, 422-423 

volume, of a parallelopiped, 460 


zero matrix, 70 

zero-sum game, 435 

ZSCORE, APL function, 124, 127 
scores, 41-44 


